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» PREFACE 

The object of this work is to provide a mathematical text on the 
Theory of Statistics, adapted to the needs of the student with an 
average mathematical equipment, including an ordinary knowledge 
of the Integral Calculus. The subject treated in the following 
pages is best described not as Statistical Methods but as Statistical 
Mathematics, or the mathematical foundations of the interjxretation 
of statistical data. The writer’s aim is to explain the underlying 
principles, and to prove the formulae and the vahdity of the 
methods which are the common tools of statisticians. Numerous 
examples are given to illustrate the use of these formulae ; but, in 
nearly all cases, heavy arithmetic is purposely avoided in the desire 
to focus the attention on the principles and proofs, rather than on 
the details of numerical calculation. 

The treatment is based on a course of about sixty lectures on 
Statistical Mathematics, which the author has given annually in 
the University of Western Austraha for several years. This course 
was undertaken at the request of the heads of some of the science 
departments, who desired for their students a more mathematical 
treatment of the subject than those usually provided by courses 
on Statistical Methods, The class has included graduates and 
undergraduates whose researches and studies were in Agriculture, 
Biology, Economics, Psychology, Physics and Chemistry. On 
account of such a diversity of interest the lectures were designed 
to provide a mathematical basis, suitable for work in any of the 
above subjects. No technical knowledge of any particular subject 
was assumed. 

The first five chapters deal with the properties of distributions 
in general, and of some standard distributions in particular. It is 
desirable that the student become familiar with these, before 
being confronted with the theory of sampling, in which he is 
required to consider two or more distributions simultaneously. 
However, the reader who wishes to make an earlier start with 
sampling theory may take Chapters vi and vn (as far as § 60 ) 
immediately after Chapter m, since these are independent of the 
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theory of Correlation. The theory of Partial and Multiple Correla- 
tions has been left for the final chapter, in order not to delay the 
study of sampling theory and tests of significance. But those who 
wish to study this subject earlier may read this chapter im- 
mediately after Chapter v, or even after Chapter iv. This order, 
however, is not recommended for the beginner. 

A feature of the book is the use of the properties of Beta and 
Gamma variates in proving the sampling distributions of the 
statistics, which are the basis of the common tests of significance. 
Consequently a special chapter (viii) is devoted to the properties 
of these variates and their distributions. The treatment is simple, 
and does not assume any previous acquaintance with the Beta and 
Gamma functions. The author believes that the use of these 
variates brings both simplicity and cohesion to the theory. The 
student is strongly urged to master the theorems of Chapter vni 
before proceeding to tests of significance. In the preparation of 
this chapter, and the following one, much help was derived from 
the study of a recent paper by D. T. Sawkins.* 

The considerations, which determined the presentation of the 
subject of Probability in Chapter n, are the mathematical attain- 
ments of the students for whom the book is intended, and their 
requirements in studying the remaining chapters. The approach 
decided on is substantially that of the classical theory. After the 
proof of Bernoulli’s theorem, the relation between the a 'priori 
definition of probability and the statistical (or empirical) definition 
is considered; and the measure of probability by a relative fre- 
quency in each case is emphasized. If the book had been intended 
primarily for mathematical specialists, a different presentation of 
the theory would have been given. But it is futile to expect the 
average research worker to appreciate an exposition like Kolmo- 
gorofifst or Cramer’s, J based on the theory of completely additive 

♦ ‘Elementary presentation of the frequency distributions of certain 
statistical populations associated with the normal population.’ Joum, and 
Proc. Roy. Soc, N.S.W, vol. 74, pp. 209-39. By D. T. Sawkins, Reader in 
Statistics at Sydney University. 

t Orundbegriffe der Wahrscheinlichkeitsrechnung, Berlin, 1933. 

X Random Variables and Probability Distributions, University Press, 
Ceunbridge, 1937. 
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set functions. The elementary properties of the moment generating 
function and the cumulative function are also given in Chapter ii, 
and are used throughout the book. These functions are introduced, 
not as essential concepts, but as useful instruments which lead to 
simpler proofs of various theorems. What has been written about 
them should convince the reader that they deserve his careful 
attention. 

I wish to express my appreciation of the care bestowed on this 
book by the staiBl of the Cambridge University Press, and my 
pleasure in the excellence of the printing. My thanks are due to 
Professor R. A. Fisher and Messrs Oliver and Boyd for permission 
to print Tables 3, 4, 5 and 7, which are drawn from fuller tables 
in Fisher’s Statistical Methods for Research Workers. I am also 
indebted to Professor G. W. Snedecor and the Iowa Collegiate 
Press for permission to reproduce Table 6, which is extracted from 
a more complete table in Snedecor’s Statistical Methods. Lastly 
I wish to thank Mr D, T. Sawkins for help received in corre- 
spondence concerning statistical theory, and Mr Frank Gamblen 
for assistance in reading the final proof. 

C.E.W. 


PERTH, W.A. 
Aprils 1940 


NOTE ON THE SECOND EDITION 

The call for reprinting has given an opportunity to correct a 
number of small errors and misprints throughout the book, and 
to add a new reference here and there. Paragraph 91 on page 195 
is new to this edition. 


1949 


C.E.W. 




CHAPTER I 


FREQUENCY DISTRIBUTIONS 

1. Arithmetic mean. Partition values 

Consider a group of N persons in receipt of wages. Let x shillings 
be the wage of an individual on some specified day. Then, in general, 
X will be a variable whose value changes with the individual. 
Possibly the N values of x will not all be different. Suppose there are 
only n different values x^, x^y x^ which occur respectively 
/i>/ 2 » “M/n times. The numbers in which the subscript i takes the 
positive integral values 1, 2, ..., n, are called the frequencies of the 
values x^ of the variable a;; and the assemblage of values x^, with their 
associated frequencies, is the frequency distribution of wages for that 
group of persons on the day specified. The sum of the frequencies is 
clearly equal to the number of persons in the group, so that 

^=/l+/2+-+/n= SA. (1) 

i- 1 
n 

where, as the equation indicates, 2 h denotes the sum of the n 

1 

frequencies /^, i taking the integral values 1 to n. When the range of 
values of i is understood, the sum will be denoted simply by 2/^ or 

i 

Yifi- The number N is the total frequency . The mean of the distribu- 
tion is the arithmetic mean x of the N values of the variable, and is 
therefore given by 

* = (/ia;i+/aa:a + ... +/„x„) = (2) 

since the value x^ occurs times. This formula expresses what is 
meant by saying that x is the weighted mean of the different values 
whose weights are their frequencies /^. 

Frequency distributions of many different variables will occur in 
the following pages. Thus the values of the variable x may be the 
heights, or the weights, or the ages of a group of persons, or the 
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yields of grain per acre from a number of plots of land. For each 
finite distribution will denote the frequency of the value x^. The 
total frequency N is then given by (1), and the mean x of the 
distribution by (2). 

Example, The student to whom the above summation notation is new, 
may profitably verify the following relations. If a is a constant, 

i i 

i i 

i i i 

If {Xi,yi) is a pair of corresponding values of two variables, x and y, with 
frequency /<, 

i i i 

The symbol used as a subscript in connection with summation is immaterial; 
but i,j, r, «, t are perhaps most .commonly employed. 

Suppose that the frequency distribution of x consists of k partial 
or component distributions, being the mean of the jth component 
and its total frequency, so that 

f-i 

Then that part of the sum which belongs to the Jth com- 

i 

ponent has the value n^Xj, and the relation (2) is equivalent to 

- 1 * 

X = ( 3 ) 

Consequently the mean of the whole distribution is the weighted 
mean of the means of its components, the weights being the total 
frequencies in those components. 

Again, let u and v be two variables with frequency distributions 
in which a value of v corresponds to each value of u. Then the values 
of the variables occur in pairs. Let N be the number of pairs of 
values (w^, v^), and let 
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The arithmetic mean of the N values of x is equal to that of the N 
values of the second member, which iau-^v. Consequently 

^ + v, (4) 

which expresses that the mean of the sum of two variables is equal 
to the sum of their means; and the result can be extended to the sum 
of any number of variables. The reader can prove similarly that, if 
a and b are constants, and 

X = au-{- bVy 

then x — au + bv, (4') 

Suppose the N values of the variable in the distribution to be 
arranged in ascending order of magnitude. Then the median is the 
middle value, if N is odd; while, if N is even, it is the arithmetic 
mean of the middle pair, or, more generally, it may be regarded as 
any value in the interval between these middle values. Similarly, 
the quartileSy Qg, are those values in the range of the variable 

which divide the frequency into four etjual parts, the second quartile 
being identical with the median; and the dilference between the 
upper and lower quartiles, Q^—Qiy is the interquartile ranr/c. -The 
deciles and percentiles are those values which divide the total fre- 
quency into ten and one hundred equal ])arts respectively. The 
median, quartiles, deciles and percentiles are often spoken of 
collectively as values, since each set of values divides the 

frequency into a number of equal parts. Sometimes they are 
referred to as quantiles. 

That value of the variable whoso frequency is a maximum is 
called a mode, or modal value, of the distribution. When, as usuaUy 
happens, there is only one mode, the distribution is said to be 
unimodaL 

We shall presently consider continuous frequency distributions. 
But it should be pointed out at once that the variable may be either 
continuous or discret/C. A continuous variable is one which is capable 
of taking any value between certain limits; for example, the stature 
of an adult man. A discrete variable is one which can take only 
certain specified values, usually positive integers; for example, the 
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number of heads in a throw of ten coins, or the number of accidents 
sustained by a worker exposed to a given risk for a given time. Of 
course an observed frequency distribution can only contain a finite 
number of values of the variable, and in this sense all observed 
frequency distributions are discrete. Nevertheless, the distinction 
between continuous and discrete variables will be found to be of 
importance when we come to study populations and probability 
distributions. In the next few sections we shall assume that the 
values of the variables are discrete. 

2, Change of origin and unit 

The following graphical representation will be found helpful. 
Taking the usual a;-axis with origin O, we may represent the variable 
z by the abscissa of the current point P. Then x is the abscissa of a 

O AGP X 

^ 1 1 1 1 

Fio. 1 

fixed point 0. It is frequently convenient to take a new origin at 
some point A, whose abscissa is a. Let ^ be the abscissa of P relative 
to as origin. Then, since OP => OA -f 4P, we have 

a? =* a + J. 

Thus ^ is the excess of z above a, or the deviation of x from that 
value. Taking the mean of each member of this equation we have 

+ ( 6 ) 

Thus « — a, which is the deviation of the mean value of z from o, is 
equal to which is the mean of the deviations of the values z^ from 
a. In particular, by taking a as x, we have the result that the sum of 
the deviations of the values x^ from their mean is zero. This is also 
easily proved directly ; for 


j:fi(x^-z) = ^fiX^-Nx =. 0 
i 


in virtue of (2). 


( 6 ) 
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In addition to choosing A as origin it may be convenient to use 
a different unit, say c times the original unit. Then, if w is the 
deviation of P from A measured in terms of this new unit, 

u = (x — a)/c 

or X = a + cu. (V 

Taking the mean value of the variable represented by either side 
vve have, in vii tue of (4'), 

x = a-hcu, (8) 

u being the mean value of u for the distribution. 

Exaw.ple, Eight coins were tossed together, and the number x of heads 
resulting was observed. The operation was performed 256 times; and the 
frequencies that were obtained for the different values of x are shown in the 
following table. Calculate the mean, the median and the quartiles of the 
distribution of x. 




e 

/i 

/e 

/£* 

0 

1 

-4 

- 4 

16 

~ 64 

1 

9 

-3 

-27 

81 

-243 

2 

26 

-2 

-52 

104 

-208 

3 

69 


-59 

59 

- 69 

4 

72 

0 

0 

0 

0 

5 

62 

1 

62 

62 

52 

I 6 

29 

2 j 

68 

116 

232 

7 

7 

3 1 

21 

63 

189 

8 

1 

4 ! 

4 

16 

64 

Totals 

1 256 

— 1 

i - 7 

507 

- 37 


Tile different values of x are shown in the first column, and their frequencies 
in the second. The calculation is simplified by taking the value a; = 4 as 
origin of The values of ^ correspohding to those of x are given in the third 
column, and those of the product/^ in the fourt h. The remaining columns are 
not needed for the present. Totals for tlie various columns are given in the 
bottom row. Hence 

I = -7/256 = -0 027. 

N i 

The mean value of x is therefore 

X = a + l = 4-0-027 = 3-973. 

Tlie mode, being the value of x with the largest freqiu^ncy, is clearly 4. To 
find the median we observe that the values ol x are arranged in €«cending 
order, and that the 128th and ^129th are both 4. Hence the median is also 4. 
Similarly the 64th and 65th values are both 3, so that the lower quartile is 3. 
In the same way we find that the upper quartile is 6. 
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3. Variance. St .ndard deviation 

The mean square deviation of the variable x from the value a is, 
as the name implies, the mean value of the square of the deviation 

of X from a. It is therefore given by The positive square 

root of this quantity is the root-mean-square deviation from a. In 
the important case in which the deviation is taken from the mean of 
the distribution, the mean square deviation is called the variance 
of a:, and is denoted by The reason for the notation will appear in 
the next section. The positive square root of the variance is called 
the standard deviation (s.d.) of x, and is denoted by o*. Thus 

/‘a = ^ (9) 

The variance (or the s.d.) may be taken as an indication of the 
extent to which the values of x are scattered. This scattering is 
called dispersion. When the values of x cluster closely round the 
mean, the dispersion is small. When those values, whose deviations 
from the mean are large, have also relatively large frequencies, the 
dispersion is large. The concepts of variance and s.d. will play a 
prominent part in the following pages. 

When the mean square deviation from any value a is known, and 
also the deviation | of the mean from that value, the variance is 
easily calculated. For 

( 10 ) 

This formula is of great importance, and will be constantly employed. 
On multiplying by N we have an equivalent relation, which may be 
expressed 

= ( 11 ) 

Thus Ncr"^ is less than showing that the sum of the squares of 

i 

the deviations of the values x^ is least when the deviations are measured 
from the mean. 
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Another possible measure of dispersion is the mean value of the 
absolute deviation from the mean of the distribution, commonly 
called the mean deviation from the mean. This quantity, however, 
does not lend itself readily to algebraical treatment, and is therefore 
not nearly so important as the variance and the s.d. The semi- 
interquartile range is also sometimes taken as an indication of the 
magnitude of the dispersion. 

The significance of the magnitude of the standard deviation 
clearly depends upon the values of the variable. Thus a s.n. of 6 in. 
in the measurements of the height of a tower, is much less significant 
than an equal s.d. in the measurements of the height of a man. The 
ratio of the s.d. to the mean value of the variable is called the 
coefficient of variation. It is an absolute measure of dispersion in the 
sense that it is independent of the unit employed. And by means of 
this coefficient we are able to compare the variabilities of distribu- 
tions of different characteristics, such as weight and height. Some- 
times the coefficient of variation is defined as 100 times the above 
value, i.e. as the percentage of the mean which is equal to the s.d. 

Example 1. In the example of the preceding section the mean square 
deviation of x from the value 4 is 607/256 = 1*98. Hence the varicuice is 
given by 

= l'98-l* = 1-98 -(0027)* = 1-98, 
and O’ = 1-407 = 1-41 nearly. 


From the column it is clear that the sum of the absolute deviations from 
x = 4 is 277. By measuring deviations from the mean, instead of from x = 4, 
we increase the absolute deviations of 161 values, and decrease those of 96 
values, by 0 027. Hence the sum of the absolute deviations from the mean is 
277 -f 66(0-027) = 278-78. The mean deviation is therefore 1-09 approxi- 
mately. 


Example 2. Find the mean and the variance for the distribution in which 
the values of x are the positive integers 1, 2, 3, the frequency of each 

being unity. 


Here 


N{N^\) 

2N~~ 




The mean square deviation from a; = 0 is 

(l* + 2* + ... + Ar«)/Ar = 1) (22V+ 1), 


Henoe 
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Example 3. For the distribution expressed by 

x= 5 6 7 8 9 10 11 12 13 14 16 

/ = 18 25 34 47 68 90 80 62 38 27 11 

the total frequency is 600. Show that the mean value of x is 10* 054, the vari- 
ance 6*68, the s.D. 2*36, the median 10, and the lower and upper quartiles 9 
and 12 respectively. Also calculate the mean deviation from the mean as 
in Ex. 1. 

Example 4. A distribution consists of several component distributions. 
Express the variance of the whole distribution in terms of those of the 
components and the deviations of the means of the components from the 
general mean. 

Let tif be the frequency in the^th component, its 8.D., and dj — Xf — x the 
deviation of its mean from the general mean. Then the mean square deviation 
of this component from the general mean is (rj + cf/, and the sum of the 
squares of its deviations from the general mean is ny(cry Hence the 
variance of the whole distribution is given by 

Ncr^ = 2ny(<rjf 

where N is the total frequency S n^. 

i 

4. Moments 

In the notation of the- preceding sections the mean value of the 
rth powe^ of the deviation of the variable from the value a is 
This is usually called the rth moment of the distribution 

i 

about the value a, or the moment of order r. The term ‘moment’ is 
borrowed from Mechanics. Since is the relative frequency of the 
value in the distribution, and the deviation ^ from a is represented 
by the distance AP, the above expression can be regarded as the 
sum of the rth moments of the relative frequencies about A. The 
rth moment about the mean of the distribution is denoted by 
The corresponding moment about a specified value other than the 
mean, will be denoted* by /t'. Thus 

( 12 ) 

is the rth moment about the mean, while 

'LfMi-aY = ^ S/iS (13) 

* An alternative notation, with instead of /t', has some advantages. 
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is the rth moment about the value a. Putting r = 0 we see that 

= /^O ~ 1* (1^) 

Similarly, in virtue of (2) and (6), we have 

= l> IH = 0. (16) 

The second moment about the mean is clearly the variance already 
discussed. 

By means of the binomial expansion, moments about the mean 
of the distribution may be expressed in terms of moments about 
any other value, x = a. Thus 

^ ( 16 ) 

denoting the binomial coefficient, often written ’’Cg or CJ. In 
particular, in virtue of (14) and (15), 

in agreement with (10). Similarly 

/*8 = i“3-3|/(2 + 2 ^*, I 

H- + 6IV2 - 

and so on. 

In calculating moments it is frequently convenient to change the 
unit. As in § 2, let u be the measure of the deviation from a: = a, in 
terms of a unit c times the original unit, so that g = cu. Then the 
rth moment of x about a is 

= = (19) 

Thus the rth moment of the variable x is times the corresponding 
moment of the variable u, 

A distribution is said to be symmetrical when the frequencies are 
symmetrically distributed about the mean, that is to say, when 


(17) 

(18) 
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values equidistant from the mean have equal frequencies. For 
example, the distribution expressed by 

x=01 2 3 4 5 678 

/= 1 8 28 66 70 56 28 8 1 

is symmetrical about its mean x = 4. In the case of a symmetrical 
distribution there is the simplification that all the moments of odd 
order about the mean are equal to zero, since the terms of the sum 
in (12) cancel in pairs. In the case of an unsymmetrical distribution, 
the degree of departure from symmetry is called its skewness. 
More than one measure of this property has been proposed. One of 
the simplest is pt jor^y while another is half this expression. These are 
clearly independent of the unit chosen for the variable, and they 
vanish if the distribution is symmetrical. Another measure of 
skewness, proposed by Karl Pearson, will be given later. 

Example 1. For the distribution of x in the example of §2, the tlxird moment 
about J = 0 is 

= -37/256 = -0-145. 

Hence the third moment about the mean is given by 

= 0- 145~ 3( - 0-027) (1-98) + 2( - 0-027)» 

= 0-018 nearly. 

The skewness, calculated from the formula is 

0-018/2'8 = 0-0064, 

which is very small. 

Example 2. For the distribution in § 3, Ex. 3, show that ~ i 92, and 
deduce that = —0-146. 

5. Grouped distribution 

Frequently the number of different values of the variable repre- 
sented in the distribution is so large that, for convenience in cal- 
culating the moments, it becomes necessary to approximate by 
grouping the values. In such cases the range of variation of x is 
usually divided into a number of equal intervals. The group of 
values falling in a given interval constitutes a class; and the number 
of such values is the class frequency. The magnitude of an interval 
is called the clcLsa interval. For simplicity of calculation the number 
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of intervals chosen will not be too large, preferably not more than 
20, or at the most 25; and, in order that the results may be suffi- 
ciently accurate, the number must not be too small, preferably not 
less than 12. The approximation in the calculation of moments 
consists in regarding each value as equal to the mid-value of the 
interval in which it falls. We shall later meet formulae,* known as 
Sheppard's adjustments, giving the corrections that may be applied 
under certain conditions to the approximate values of the moments 
calculated as above. We merely mention here that, under the 
conditions referred to, the correction to the mean may be neglected, 
while the calculated variance should be reduced by c*/12, c being 
the magnitude of the class interval. The following is an example of 
the treatment of a distribution by grouping. 

Example, Over a period of years, 670 students were examined in Mathe- 
matics I at the annual examinations of the University of Western Australia. 
The marks gained by students ranged from 0 to 99, all being integers. These 


Percentages in Mathematics 


Interval 

Mid- 

value 

/ 

u 

fu 

yu» 

/«» 

0 to 4 

2 

12 

-10 

-120 

1,200 

-12,000 

6 to 9 

7 

13 

- 9 

-117 

1,053 

- 9,477 

10 to 14 

12 

13 

- 8 

-104 

832 

- 6,656 

16 to 19 

17 

14 

- 7 

- 98 

686 

- 4,802 

20 to 24 

22 

23 

- 6 

-138 

828 

- 4,968 

25 to 29 

27 

23 

- 6 

-116 

676 

- 2,875 

30 to 34 

32 

29 

- 4 

-116 

464 

- 1,856 

36 to 39 

37 

34 

- 3 

-102 

306 

- 918 

40 to 44 

42 

44 

- 2 

- 88 

176 

- 352 

45 to 49 

47 

44 

- 1 

- 44 

44 

- 44 

60 to 64 

62 

60 

0 

- 1,042 


-43,948 

66 to 69 

67 

62 

1 

52 

52 

62 

60 to 64 

62 

61 

2 

122 

244 

488 

66 to 69 

67 

41 

3 

123 

369 

1,107 

70 to 74 

72 

32 

4 

128 

512 

2,048 

75 to 79 

77 

27 

6 

135 

676 

3,375 

80 to 84 

82 

23 

6 

138 

828 

4,968 

85 to 89 

87 

17 

7 

119 

833 

6,831 

90 to 94 

92 

13 

8 

104 

832 

6,656 

95 to 99 

97 

5 

9 

45 

405 

8,646 





066 


28,170 

Totals 



670 

— 

- 76 

10,914 

-15,778 


• See § 16. 
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were grouped in 20 classes, with a class interval of 6, the class frequencies 
being as shown in the accompanying table. The mid -values of the intervals 
are 2, 7, 12, ...» 97. The calculation is simplified by taking the class interval 
aa a new unit (c = 6), and the value x = 52 as a new origin. The deviations u 
from this origin, measured in class intervals, are shoivn in the fourth column. 
The above choice of origin ensures that the larger frequencies are multiplied 
by the smaller values of w. In the row corresponding to w = 0 the entries for 
/u,/u*, etc., are all zero, and need not be recorded. The space may therefore 
be used to record the sums of the negative numbers above this row. The sums 
of the positive numbers are similarly indicated at the bottom; and the total 
for each column is given in the last row. 

The mean value of u is therefore 

S = = -76/570 = -2/15 = -0-133, 

N ^ 

and the mean percentage 

X = a + cu = 52-0*667 = 51*333. 

The mean square deviation of u from w = 0 has the value 10,914/570 = 19*16; 
and the variance of u, which may be denoted by (tJ, is 

= 1915-tia=: 19*13. 

Consequently <r„ = 4*374, and therefore or, = 21*87. If, however, Sheppard’s 
adjustment is made, we find 

cr;= 19*047, or,. = 4*365, cr, = 21*82. 

The third moment of u about u = 0 has the value — 16,778/570 = — 27*68. 
Thus, in virtue of (18), 

/«, = - 27*68 -f 7*65 - 0*005 = - 20*03. 

The skewness of the distribution, calculated from the formula is 

— 0*24 nearly. The reader may verify that the mean deviation of x from the 
mean is about 17*7. 

The partition values may be estimated on the assumption that the fre- 
quency in any class is evenly distributed among the values in that class. The 
reader will find in this way that the median is 53, and the lower and upper 
quartiles 37 and 66 respectively. 

6. Continuous distributions 

The distributions considered so far are discrete distributions, or 
distributions of discrete values of the variable. A continuous dis- 
tribution is one in which the variable takes every value between 
certain limits, a and 6. The total frequency is therefore infinite; and 
so is the frequency within any finite interval, a to of the range of 
the variable. We shall confine our attention to continuous distribu- 
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tions, which are such that the relative frequency in the infinitesimal 
interval x — \dx to x-{-\dx is expressible as f{x)dx, where f{x) is a 
continuous function of x, called the relative frequency density. Then 
the continuous curve 

y=f{^) / (20) 

is the relative frequency curve for the distribution. If this curve is 
symmetrical about some line x — c, the distribution is said to be 
symmetrical. The infinitesimal interval x — \dx tox-\- \dXy with mid- 
value X and magnitude dx, may be conveniently referred to as the 
interval dx. The relative frequency /(a:)dx in this interval is repre- 
sented by the area under the curve (20), between the ordinates at 
the ends of the interval. Hence the relative frequency for the 

Cfi 

interval a to /? is given by the integral f{x) dx. The sum of all the 
relative frequencies is unity, so that 


As in the case of a discrete distribution, the moment of order r 
about a specified value is the sum of the rth moments of the relative 
frequencies about that value. Hence the mean Xy which is the first 
moment about the origin, is given by 


x = J^x/l 


\x) dx. 


The moments of order r, about the origin and the mean, are 

/^r= f 3ff(x)dx, llr={ (x-xYf(x)dX. 

J a J a 

And, in virtue of the binomial expansion, it follows as in § 4 that 
the relations (16)-(18) hold for a continuous distribution also. In 
particular, the variance of the distribution is given by 

cr* = /ig = J x^f(x)dx — x^. (23) 


Example 1. By way of illustration consider a straight rod of length 1. 
The distance a; of a molecule of the rod from one end may be regarded as a 
continuous variable, ranging from 0 to I; cmd the distribution of the values 
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of X is then continuous. If AT is the mass of the rod, and the linear density m 
is continuous, the function /(x) for the distribution has the value m/M, 

In the case of a uniform rod, of length 2a, the distance x of a molecule from 
the middle point varies continuously from — a to a, and the distribution of 
X is continuous with /(x) = 1 /2a. The frequency curve is a straight line parallel 
to the x-axis, and the distribution of x is said to be rectangular or uniform. 
The mean value of x is now zero, and the variance is given by 

. r® x^dx a® 

All the moments of odd order about the mean are zero. The moment of order 
2n is easily shown to be a^"/(2n-f 1). 

Example 2. Draw the frequency curve for the symmetrical distribution 
in which 



the range of the variable being —a to a; and show that/(x) satisfies the 
condition (21). Also show that the variance is 

<r* = d2(4~7r)/7r = 0*273a*, 
and the s.d. cr = ()-52a, while 

= 0^1-8/371) = 0161a*. 

It is apparent from the above that a discrete distribution cannot 
be identified with a continuous one. Sometimes, however, a discrete 
distribution approximates to a certain continuous distribution 
in the sense that, for any interval in the range of the variable, the 
relative frequency of is approximately equal to that of D^. 
Such an approximation requires that the total frequency N of 
should be large, and that there should be only very small intervals 
between pairs of adjacent values of the variable. Consider, for 
example, the distribution of ages of all living people, not more than 
(say) 60 years old at a specified instant. The variable z ranges from 
0 to 60 years; and the frequency in any interval, not less than a few 
hours, is so large that, since a period of a few hours is very small 
compared with 60 years, the distribution may be regarded as 
continuous for all practical purposes. If the necessary data 
were available, we could calculate a continuous function f(z) to 
give the relative frequency density to any reasonable degree of 
accuracy. 
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The diagram illustrates the frequency curve of an nnsymraetrical, 
unimodal, continuous distribution. The mode of the distribution is 
the abscissa of the maximum ordinate N Since the area under 
the curve in any interval rej^resents the relative frequency for that 



interval, it follows that the median is the abscissa of the ordinate 
MM\ which bisects the area under the curve. And from (22) we see 
that the mean is the abscissa of the ordinate G0\ which passes 
through the centroid of the area under the curve. We may mention 
in passing that, when the skewness is small, the relation 

mean — mode = 3(mean — median) 

holds fairly accurately. The student may regard this as an empirical 
relation, since we shall not give any proof. The definition of skewness 
most frequently used is that of Karl Pearson. It is 

, mean — mode 

skewness — . 

S.D. 

When the mode can be accurately determined this definition is a 
convenient one. The frequency curve in the diagram is that of a 
positively skew distribution, the longer tail of the curve being to 
the right. 

The ratio of the fourth moment about the mean of a distribution, 
to the square of the variance, is inde})endcnt of the unit employed. 
This invariant of the di.stribution is called its kurtosis, and is fre- 
quently denoted by Thus 


P* * /ti//**- 


(24) 
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In the normal distribution, to be considered later, the kurtosis has 
the value 3. Since this distribution is regarded as the standard or 
ideal, the quantity ^ distribution is called its excess of 

kurtosis, or briefly its excess. Corresponding to the above notation, 
yffi is used to denote the invariant This is the square of the 

skewness as defined in § 4. 
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EXAMPLES I 

1 . A distribution consists of three components with frequencies 
of 200, 260 and 300, having means of 25, 10 and 15, and standard 
deviations of 3, 4 and 5 respectively. Show that the mean of the 
combined distribution is 16, and its s.d. 7*2 approximately. 

2. The yields of grain [x lb.) from 500 small plots are grouped 
in classes with a common class interval (0-2 lb.) in the table below, 
the values of x given being the mid -values of the classes. Show 
that the mean of the distribution is 3-95 lb., its s.d. 0*46 lb., tht' 
median also 3*96 lb., and the quartiles 3*63 and 4*28 lb. 


X 

/ 

1 ^ 

/ 

1 « 1 

/ 

X 

! / 

1 X 

1 / 

2-8 

4 

3-4 1 

47 

40 

88 

4-6 1 

36 

6-2 

4 

30 

16 

3-6 I 

63 

4-2 

69 

4*8 I 

10 1 





3-2 

20 

3-8 

78 

4-4 

69 

60 1 

8 j 

— 

— 


(Mercer and Hall, Joum, Agric, Scl, 1911, vol. 4, p. 107.) 


♦ The references are to the literature listed at the end of the book. The 
student is strongly advised to read some of the literature mentioned at the 
end of each chapter, preferably in the order given. He will thus acquire « 
better background for the mathematical theory. 
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Examples 

3. The wages of 1,000 employees range from 4s. W. to 19s. 6d. 
They are grouped in 15 classes with a common class interval of Is., 
and the class frequencies, from the lowest class to the highest, are 
6, 17, 35, 48, 65, 90, 131, 173, 155, 117, 75, 52, 21, 9, 6. Tabulate 
the data, and show that the mean wage is 12-OOGs., the s.d. 
2*626s.»2s. l\d., and the median 12*127s. = 12s. l\d., and the 
mode 12*369s. = 12s. 4jd. nearly. (Adjusted s.d. 2-61s.) 

4. With the notation of § 4, and by a method similar to that used 
in proving (16), show that 

^4 = i“4 + + 5*. 

Ar = /*r + »‘2>r-I+ (2)^Vr-l+ — + Q S’’"*/** + S’"- 

5. The first three moments of a distribution about the value 2 
of the variable are 1, 16 and —40. Show that the mean is 3, the 
variance 15, and = — 86. Also show that the first three moments 
about x«0 are 3, 24 and 76. 

6 . Prove that the mean deviation from the median is less than 
that measured from any other value. (See Aitken, 1939, i, p. 32.) 

7. Show that, if the class interval of a grouped distribution is 
less than one-third of the calculated s.d., Sheppard’s adjustment 
makes a difference of less than ^ % in the estimate of the s.d. 

8. Show that, if the variable takes the values 0, 1, 2, 3, ..., n 
with frequencies proportional to the binomial coefficients 

, ..., n, 1 respectively, then the mean of the distribu- 
tion is the second moment about a; = 0 is n(n-f l)/4, and the 
variance is \n. 

9. In a continuous distribution, whose relative frequency density 
is given by f{x) = 3x(2 — a;)/4, the variable ranges from 0 to 2. Show 
that the distribution is symmetrical, with mean a; = 1, and variance 
1/5. Show that the second and third moments about a: = 0 are 6/5 

^and 8/5 respectively; and verify that /4, — 0. 



WMt 
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10 . Exponential distribution. Consider the continuous dis- 
tribution in which J(x) = a being positive, and the variable 

ranging from 0 to oo. Show that the mean is l/o and the variance 
1/a*. Also prove that the second and third moments about a; = 0 
are 2/a* and 6/a* respectively, and that = 2/a*. 

11. Factorial moments. The factorial moments of a distribution 
are defined as follows. That of order r about the origin (a; » 0) is 

where a5*’> = l)(x— 2) ... (x~r+ 1). 

Show that factorial moments are related to ordinary moments by 
the equations 

A(3) = /^3 ““ 3/^2 + 

/<4) = 6/4+ ll/t;- 62, 
and so on. Similarly 

= /4 = i“^+3/45)+2, 

/^4 ~ /He + 6// Jj) + 7//(2) + X, 

The expression for the factorial moment about the mean is 
obtained from that defining /4.) by replacing x by the deviation 
x — x from the mean. Relations corresponding to the above are 
obtained by dropping the dashes and putting x = 0. The student 
may then easily verify that factorial moments about the mean are 
connected with those about the origin by the equations 

/He ~ /He ^ + ^> 

A<( 3 ) = /4) ~ 3/<a)X + 2x* - 2x, 

A( 4 ) 4x/£^) + 6x(x + 1) /i^— 3x{x* + 2x* -X - 2). 



CHAPTER n 


PROBABILITY AND PROBABILITY 
DISTRIBUTIONS 

7. Explanation of terms. Measure of probability 

We begin by approaching the subject of probability from the 
point of view of the classical theory. Later in the chapter we shall 
give the statistical or empirical approach, and the relation between 
the two will become apparent. We hope in this way to provide a 
simple introduction to the subject, adapted to the needs of those 
for whom the book is written. 

The throwing of an ordinary cubical die may result in any one 
of six different cases, in the sense that any one of the six faces may 
be uppermost when the die comes to rest. This group of six cases is 
exhaustive, because it includes aU possible cases that may result. 
The different cases are mutually exclusive, since no two faces can 
be uppermost at the same time. The throwing of the die may be 
referred to as a trial. In general a trial is the establishing of certain 
conditions, which must produce one of several results or cases. Two 
such cases are said to be mutually exclusive when the happening of 
one of them precludes the happening of the other; and a group of 
cases is said to be exhaustive when it includes all possible ones. The 
number of cases favourable to an event A are those that entail the 
happening of A, For example, in the throwing of the die three of 
the cases are favourable to the appearance of an even number, and 
two are favourable to the appearance of a multiple of 3. 

Suppose that our die is a perfect cube made of homogeneous 
material, and that the marking of the faces has not made it dynamic- 
ally unsymmetrical. Then there is no reason to expect that, as the 
result of an unbiased throw, any particular face will come uppermost 
rather than any other. We say that the six cases are equally Ukely, 
or equally probable. Similarly the 52 cases that may result from 
the drawing of a card without discrimination from an ordinary pack, 
axe equally likely. And, in general, two events are said to be eqvxilly 
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likely if, After all :elevant evidence has been taken into account, one' 
of them may net be expected rather than the other.* 

The terms probable and probability are used in ordinary language 
in many different connections; and the reader will agree that, in a 
great many instances, it is useless to attempt a numerical estimate 
of the probability under consideration. For instance, we cannot 
give a numerical value to the probability that a certain man’s 
political or religious beliefs are correct, or that a statement by a 
perfect stranger is true. There is, however, a very large group of 
questions in connection with which a numerical estimate of prob- 
ability may be attempted with very useful results. We shall first 
state the method of measurement adopted in the classical theory, 
and then give examples of problems to which the method is applic- 
able. 

Measure of probability.! If a trial may result in any one of n 
exhaustive, mutually exclusive and equally likely cases, and m of these 
are favourable to an event A, then the probability (or chance) that A 
will happen as the result of the trial is measured by the quotient mjn. 

This measure of the probability of the event A is denoted by p. 
Thus 


and p is clearly a positive number, not greater than unity. The 
opposite event is understood as the failure of A to happen; and its 
probability is denoted by q. Since the number of cases involving the 
failure of .4 is w — m, it follows that 

q = (n — m)jn = \ —mjn = 1 — p, 

so that ^ + (2) 

This method of measuring probability is confined to problems in 
which the results of a trial are reducible to a certain number of 
equally likely cases. Considerations of symmetry and similarity 
frequently enable us to decide whether, in the problem before us, 
the resulting cases are of this nature; and, only if they are so, is the 
calculation valid. Objection is often raised to the use of the idea of 
* Of. Uspensky, 1937, 2, p. 5. t P- 
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equally probable cjtses in defining the measure of probability. The 
objection loses some of its force if we remember that, in the corre- 
sponding concept of temperature, there are metliods of recognizing 
equality of temperature which are quite independent of the measure- 
ment either of absolute temperature or of difference in temperature. 

When we speak of choosing one at random from a group of n 
objects, we mean that the choice is made in such a way that each 
object has the same chance of being selected. Various methods have 
been devised for making such a choice; and these do not assume any 
similarity on the part of the objects in the group. One such method 
is illustrated by the procedure often adopted in tlie case of a lottery. 
The objects are numbered I, 2, ..., n, and n similar marbles are 
numbered correspondingly. The marbles are placed in an urn, and 
thoroughly mixed; and one of them is then chosen without dis- 
crimination. The number on this marble determines the object 
selected. The method thus uses the similarity of the marbles to 
ensure that the selection is random. Statisticians have devised 
other methods of random selection; but it is unnecessary for us to 
examine them. 


J^xample 1. The chance of throwing a 6 with an ordinary die is 1/6. The 
chance of throwing an odd number is 3/6 = 1/2. 


yjkxarnple 2. In a class of 12 pupils, 5 are boys and the remainder girls. 
The probability that a pupil selected at random will be a girl is 7/12. 
The probiibility that two pupils selected at random will both be girls is 

j The odiis against their being both girls are 15:7. 



Example 3. From each of throe married couples one of the partners is 
Selected at random. What is the probability of their being all of one sex? 

The number of favourable cases is two, viz. all men or all womm. The 
total number of equally likely cases is 2®^ 8. Hence the required probability 
is 1/4. The odds against are tht^reforo 3:1. 

The probability of choosing two men and one woman is 3/8. 


8. Theorems of total and compound probability 

The determination of probabilities by direct enumeration of the 
number of cases is often laborious. The calculation may be simplified 
by using the theorems of addition and multiplication of probabilities, 
which are also known as the theorems of total and compound prob- 
ability. Let us first consider addition. 
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Suppose that a trial may result in anyone of n equally likely cases, 
of which nil favourable to an event ^ 4 1, m2 to an event , m;^ 
to an event and so on. If the events Ai, Aj^ are mutually 
exclusive, the corresponding favourable cases are all different. The 
number of cases favourable to either AiOT A2, ... or is therefore 
wii + m2+...+mjfc. Hence the probability that one of these events 
will happen is 


n 


niy rriu 

n n n 


k 


where is the probability of the event A^. We may therefore state 
the theorem of addition of probabilities: 

The probability that one of several mutiuilly exclusive events will 
happen is the sum of the probabilities of the separate events. 

In particular, if the k events Ai are exhaustive, so that they include 
all the mutually exclusive cases that may arise, the sum of the 1c 
numbers m^ is equal to n, and therefore = 1 - Thus the sum of the 
probabilities of the exhaustive and mutually exclusive cases that 
may result from the trial is equal to unity. From the above argu- 
ment it is also clear that, if an event may occur in several mutually 
exclusive forms, the probability of the events is the sum of the 
probabilities of its mutually exclusive forms. 


Example 1. What is the probability of obtaining a total of 9 points in a 
single throw with two dice T 

The event may happen in one of the four mutually exclusive forms, 6 and 3, 
6 and 4, 4 and 5, 3 and 6, the chance of each of these being 1/36. Hence the 
required probability is 1/9. 

Show that the probability of a total of 7 points is 1/6, and that of a total 
of 10 points is 1/12. 

Example 2. From a set of 17 cards, numbered 1, 2, . . ., 17, one is drawn at 
random. What is the chance that its number is a multiple of 3 or of 7 ? 

The chance of its being a multiple of 3 is 6/17, and that of a multiple of 7 
is 2/17. The required probability is therefore 7/17, the two events being 
mutually exchisive. 

Show that the chance that the number will bo a multiple of 3 or of 6, or 
of both, is also 7/17. 


Consider next a pair of related trials. Suppose, for instance, that 
we have an um containing r red and s white balls, and that we make 
a random drawing of two balls in succession. The first trial is the 
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drawing of the first ball; and, if this proves to be red, we shall say 
that the event A has happened. Similarly, the second trial is the 
drawing of the second ball, without replacing the first; and, if this 
proves to be red, we shall say that the event B has happened. The 
probability of ^ is clearly r/(r + s). That of B depends on whether A 
has happened or not. Thus, if A has happened at the first trial, the* 
probabihty of -B in the second is (r— l)/(r + 1); but, if A has not 

hapx)ened, it is 1). The former of these is the conditional 

probability of B on the assumption that A has happened. Thus the 
two events, A and B, are not independent. Two events are said to 
be independent only when the probability of either of them is not 
affected by the happening or failure of the other. By way of illustra- 
tion consider a combined trial consisting of the throwing of two 
ordinary dice, either together or in succession. If the event A is the 
throwing of a 6 with the first die, and the event B the throwing of 
a 6 with the other, the probability of B is 1/6, whether A has hap- 
pened or not. Events A and B are thus independent. 

More generally, suppose that a trial or a combination of trials 
may result in any one of n equally likely cases, some of which entail 
the happening of A alone, some that of B alone, and others that of 
both A and B. Let m be the number favourable to the event A, 
The cases favourable to both A and B are all included in these m. 
Let be their number. Then the probability p that both A and B 
will happen is given by 

mm^ 

^ n n m * 

The first quotient, min, is the probability of the event A, Also, 
since m is the number of cases that entail A, and is the number 
of these that entail B also, mjm is the conditional probability of 
B on the assumption that A has happened. Hence wo have the 
theorem of compound probability: 

The probability of the combined occurrence of two events, A and B, 
is the product of the probability of A by the conditional probability 
of B on the assumption that A has happened. 

In the case of independence the result may be stated simply, that 
the probability t^at two independent events will both happen is 
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the product of the probabilities of the separate events. The theorem 
may clearly be extended to include any number of events. 

Example 3. A coin is tossed three times. Find the chance that head and 
tail will show alternately. 

The required event has two mutually exclusive forms» viz. hecul -tail -head 
and tail -head -tail. By the above theorem the chance of either of these is 
, (i)*; and by the theorem of total probability the required chance is then 

Example 4. Three groups of children contain respectively 3 girls cmd 
1 boy, 2 girls and 2 boys, 1 girl and 3 boys. One child is selected at random 
from each group. Find the chance that the three selected comprise 1 girl 
and 2 boys. 

The event may happen in any of the mutually exclusive ways girl -boy -boy, 
boy -girl-boy, boy-boy -girl. The probabilities of these three are J-J-f, 
l i i* respectively. The required probability is their sum, which is 

13/32. 

9. Probability distributions. Expected value 

Suppose that, corresponding to the n exhaustive and mutually 
exclusive cases that may result from a trial, a variable x assumes the 
n values with corresponding probabilities p^.^hen the assemblage 
of values with their probabilities constitutes the probability 
distribution of the variable for that trial?)A variable, which possesses 
a probability distribution, is often called a variate. Most of the 
concepts introduced in connection with frequency distributions are 
equally applicable to probability distributions. Thus the rth 
moment, p', of the distribution about the value a? = 0 is defined by 
the equation 

(3) 

i 

probability taking the place of relative frequency since, as proved 
in § 8, = 1. In particular, the first moment about a; =* 0 is 

(4) 

i 

Just as in the case of frequency this moment is called the mean value 
of the variate, or of the distribution. More commonly, however, it 
is referred to as the expected value or expectation of the variate. It 
is denoted by E(x), or sometimes by z, though the former is prefer- 
able. Thus 


E{x) = 


( 6 ) 
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Let xjf'ix) be a function of the variate x. This assumes the value 
^(Xi) when x assumes the value \ and is therefore the probability 
of this value of the function. The expected value of the function is 
thus given by 

E\xlr{x)] = ’ZPiiriXi). (6) 

i 

In particular (3) expresses that is the expected value of the rth 
power of the variate. 

The rth moment about the mean of the probability distribution 
is defined by a formula corresponding to § 4 (12). Thus 


/tf = I.Pi{Xi-xY = E([x- (7) 

t 

I 

In particular, the second moment about the mean is the variance of 
the distribution; and its i^ositive square root is the standard devia- 
tion, O'. The relation 

= (8) 

holds as before. It may be expressed in the alternative notation 

E([x ^ E(x)]^) = E{x^) ~ [E(x)]\ (S') 

The relations between /i^. and are the same as for a frequency 
distribution; while, corresponding to § 2 (6), we have the identity 

E[x-E(x)]=^0. (9) 

In words, the expected value of the deviation of a variate from its mean 
is zero. 

Example 1. What is the expected value of the number of points that will 
be obtained in a single throw with an ordinary die? 

Here the variate is the number of points showing. It assumes the vcdues 
1, 2, ..., 6 with probability 1/6 in each ceise. Hence 

E(x) = (l + 2 + ... + 6)/6 = 7/2 = 3-5. 

Example 2. From an um, containing 3 red balls and 2 white, a man is to 
draw two balls at random without replacement, being promised 20s. for each 
red ball he draws, and 10s. for each white one. Find his expectation. 

For the possible results of the drawing there are three exliaustive and 
mutually exclusive cases, viz. 2 red balls, 1 red and 1 white, and 2 white. 
Tlie corresponding probabilities are easily shown to be 3/10, 3/6 and 1/10. 
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The variable is the amount to be given to the man on the result of the draw; 
and this has the value 405. in the first case, 305 in the second, and 205. in the 
third. The expected value of the amount to be given is therefore 

(1^x40 + 1x30 + 1^x20)5. = 325. 

Example 3. Show that, if c is constant, 

E(cx) = cE(x), E[c}lf(x)\ = c£[v>(x)]. 

10. Expected value of a sum or a product of two variates 
Let X and y be two variates, the first of which assumes the m values 
with probabilities (i = 1, 2, ...,m), and the second assumes the 
n values y^ with probabilities pj (j = 1, 2, ...,n). The sum x + y can 
assume the mn values of x^ + y^, since any of the m values of i may 
be associated with any of the n values of j\ Let p^j denote the 
probability that x assumes the value x^ and, at the same time, y 
assumes the value y^. Then by (5) 

r(x+j/) = s i: 

1 1 

i i if 

= IlPi^i+'Ep'fy,- 
i i 

For YiVip being the sum of the probabilities that x assumes the 

value while y assumes one of the values y^, yg, ...» y^y is equal to 
Similarly = p]. Consequently 

£'(x + y) = ^(x) + jE(y). (10) 

Thus the expected value of the sum of ttvo variates is equal to the sum 
of their expected values; and the theorem may be extended to the 
sum of any number of variates. 

The variates, x and y , are said to be iudepeudent* if the probability 

• Or, statistically independent. Since this is the kind of independence with 
which we are chiefly concerned in these pages, the adverb ‘statistically* will 
usually be omitted. Statistical independence has been described by Aitken 
(1939, 1, p. 148) as ‘obedience to the multiplication theorem of probability*. 
This will be apparent to the student after he has read §§12 and 31 below. 
Statistical independence should not be confused with Junctional independ- 
ence. The variables x and y are functionally dependent when there exists a 
functional relation F(x, y) = 0 which holds identically. They are functionally 
independent if no such relation exists. 



10] Expected Value 27 

that either of them will assume a prescribed value does not depend 
on the value assumed by the other. We proceed to examine the 
expected value of the product, xy, of two independent variates. In 
the above notation tliis product may assume the mn mutually 
exclusive values x^y^. The probabihty that the product will assume 
a particular value, x^y^ isp^Py, being the product of the probabilities * 
that X will assume the value x^ and y the value Hence the 
expected value of the product is given by 

m n 

E{xy) =22 

Performing first the summation with respect to and then that 
with respect to t, we have 

E{xy) = 'ZPiXiEii/) = E(y)'ZPiXt 

i i 

^E{x).E{y). (11) 

Thus the expected value of the product of tux> indepeuderU variates is 
equal to the product of their expected values. 

If the mean of either of the independent variates, say a;, is taken 
as origin for that variate, then E{x) = 0 , and consequently E{xy) = 0 . 
In particular this relation holds when both the variates are mea- 
sured from their means. The expected value of the product of the 
deviations of the two variates from their means is called their 
covariance. Thus the covariance of two independent variates is equal 
to zero. From this we may deduce the important theorem* 

The variance of the sum of two independent variates is equal to the 
sum of their variances. 

If the two variates are measured from their means the variance 
of their sum, being the expected value of (x-hy)*, is given by 

E{z^ + 2xy -h y*) « E(x^) + E(y^), 

and is therefore equal to the sura of the variances of x and y. The 
theorem may clearly be extended to the sum of any number of 
variates which are independent in pairs. 
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11. Repeated trials. Binomial distribution 

Suppose that a trial is repeated, so that we have a series of n 
trials. The happening of the event A as the result of a trial will be 
called a success. We consider first the case in which the trials are 
independent, and the probability p of success is the same for each 
trial. Then the probability of failure is q, where p-\-q = I, The 
probability that there will be exactly r successes in the series of n 
trials, and therefore n — r failures, is easily found. For, by the theorem 
of compound probability, the chance of r successes and n — r failures 
in a specified order is p^q^~^. But the number of different orders in 

which these successes and failures may occur is j , being the number 

of ways of selecting r out of the n positions for the successes. Con- 
sequently, by the theorem of total probability, the chance P of 
exactly r successes in the series of trials is given by 

Thus the probabilities of 0, 1, 2, n successes, in a series of n 
trials for an event of constant probability p, are the respective terms 
of the binomial expansion 

(g -h p)" = + nq^~^p + . . . + nqp^~^ -f p^. 

The probability distribution of the number of successes thus deter- 
mined is called the binomial distribution. Thus the binomial dis- 
tribution is that in which the variate assumes the values 0, 1, 2, n 
with corresponding probabilities g^, nq^~^p, ..., p^. 

We may show that the expected value of the variate in the binomial 
distribution* is np, and its variance npq. In terms of repeated trials, 
the expected value of the number of successes in one trial is p, 
since the variate assumes the values 1 and 0 with probabilities p 
and g respectively. And, since the number of successes in n trials 
is the sum of the numbers of successes in the individual trials, it 
follows from (10) that the expected value of this number is np. 
Similarly, to find the variance of the distribution we observe that, 


^ Other proofs of these properties will be given in § 15, Ex. 3, and § 17. 
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in one trial, the square of the number of successes takes the values 
1 and 0 with probabilities p and q respectively, and the expected 
value of in one trial is therefore p. Hence the variance of the 
number of successes in one trial is 

E(x^)^{E{x)f^p-p^^pq, 

And since the number of successes in n trials is the sum of the num- 
bers in the individual trials, and these trials are independent, the 
variance of the total number of successes is the sum of the variances 
for the separate trials, and is therefore npq. The s.D. of the number 
of successes in n trials is >j{npq)\ and the s.D. of the proportion of 
successes in the series is therefore yjipgin). 

Example 1 . Find the most probable number of successes in the above series 
of trials, that is to say, the number of successes which has a greater prob- 
ability than any other. 

The chance of r -f 1 successes will be greater than that of r if 


that is, if 


n — rp 
r-f Ig 


>h 




Hence the most probable number of successes is the integral part of (n -f l)p. 
If, however, (n -f 1 ) p is an integer, r-f 1, the chance of r -f 1 successes is equal 
to that of r successes, and is greater than that of any other number. Thus, if 
p = 2/5, the most probable number of successes in 20 trials is the integral 
part of 21 X 2/6, wliich is 8. In 24 trials, however, 9 and 10 are the most 
probable numbers. 

Example 2. ^’s chance of winning a game against B is 2/3. Find his 
chance of winning at least three games out of five. 

This chance is the sum of the probabilities that A will win 3, 4, and 6 games, 
and is therefore 


3/ \3/'^\6/\3; 


Example 3. Poisson's series of trials ia a series of n trials in which the 
probabilities of success are Pi, p„ p„ p„ respectively. Show that the 

n 

expected value of the number of successes is S and the variance 

i = 1 i 

By the same argument as above, the expected value of the number of 
successes at the ith trial is p<, and its variance p<g<. Hence the result. 

Example 4. Verify from (7) that a distribution, in which the variate takes 
the values 1 and 0 with probabilities p and g respectively, hcis variance pg. 
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12. Continuous probability distributions 

As in the case of frequency, a continuous distribution is one in 
which the variable may take any value between certain limits, 
a and 6. The number of different values is then infinite, and it is 
useless to speak of the probability of any particular value. Instead 
of this we consider the probability that the value of the variate will 
fall within a specified interval in the range of the variate. We confine 
our attention to cases in which the probability that the value of the 
variate will fall within the infinitesimal interval x — \dx to a; + \dx 
is expressible in the form <}){x) dx, where (^{x) is a continuous function 
of X, called the probability density or the probability function. It is, 
of course, never negative. The continuous curve y = ^(x) is called 
the probability curve\ and, when this is symmetrical, the distribu- 
tion is said to be symmetrical. The area under the curve from x = a 
to X = represents the probability that the value of x will fall 
within the interval a to /?. The total area under the curve is unity. 
Thus 



(13) 


When ^(x) is constant, the variate is said to have a uniform or 
rectangular distribution of probability. The value of the constant 
must be 1/(6 — a), in virtue of (13). 

The rth moment of the distribution about a particular value is 
the rth moment of the probability about that value. Since ^(x)dx 
is the probabihty corresponding to the interval dx, the rth moment 
about X = 0 is given by 

i: xf^(x)dx. (14) 

In particular, the expected value of x, being the first moment about 
X a 0, is 

.E'(x)=»/Aja J X{5(x)dx, (16) 

and the expected value of the function \^(x) is 




( 16 ) 
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The rth moment about the mean of the distribution is 

= j^[x-E{x)Y^{x)dx, (17) 

which is the expected value of the rth power of the deviation of x 
from, the mean. The variance of x, being the second moment about 
the mean, is given by 

/a 

or (18) 

The theorems of § 1 0 are equally true for continuous distributions. 
Let the values of x range from a to A, and those of a second variate 
y from c to <f. VVe may assume that the probabihty that x falls in the 
interval dx and, at the same time, y falls in the interval dy ’is jointly 
proportional to dx and dy, and expressible in the form (f>{Xyy)dxdy, 
where <^{x,y) is a continuous function of the two variates. Then the 
expected value of the sum of the variates is 


n d 

(x + y)^{x,y)dxdy 

n d rb rd 

x(}){x,y)dxdy+\ y<f>{x,y)dxdy. 
c J a J c 


In the first of these two integrals let the integration with respect to 

rd 

y be performed first. Then ^{x,xy)dxdy is the probability that x 
will fall in the interval dx inespectiv^e of the value of y. Denote it 
<l>i(x)dx. Similarly, in the second integral, J <l)[x,y)dxdy is the 

probability that xj will fall in the interval dy irrespective of the value 
of X. Denote it by 9 ^ 2 ( 2 /) ^2/* "Then 


E{x + y) 




^.E{x) + E{yl (19) 

as required. 

In considering the expected value of the product xy we assume 
that the vaxiates are independent. Then the probability that the 
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value of X falls in the interval dx is independent of the value of y, 
and may be expressed as <l>x(x) dx. Similarly, the probability that 
y falls in the interval dy is independent of the value of a;, and is 
expressible in the form <^> 2 (y)dy. Therefore the probability that x 
falls in the interval dx and, at the same time, y falls in the interval 
dy is <f>i(x)<f>2(y)dxdy. Accordingly, the expected value of the 
product xy is 

n d 

xy(}>^(x)<}>^{y)dxdy 

0 

= f x^i(x)dx{ y<j)^{y)dy = E{x).E{y). ( 20 ) 

J a j c 

Thus the expected value of the product of two independent variates 
is equal to the product of their expected values. And it follows, as 
in § 10, that the covariance of two independent variates is equal 
to zero. 


Example 1. Show that, if the variable x has uniform distribution of 
probability over the range —a to + 0 , then (t>{x) = l/2a, x = 0, or* = Ja*, 
the moments of odd order about x = 0 are zero, and that of order 2n is 
o*"/(2n+l). 

Example 2. Through a point B on the y-axis, whose ordinate is positive 
and equal to a, a straight line is drawn in a direction taken at random in the 
interval ^ — Jtt to 0 = 0 being the inclination of the line to BO. Examine 
the probability distribution of the intercept x on the x-axis. 

We interpret the data as meaning that the variable d has uniform distribu- 
tion of probability in the interval — f tt to Jtt. Hence the probability that 6 
will fall in the interval dO is 2d0l7r. Now the intercept on the x-axis has the 
value X = atan^, so that d = arctanx/a, and therefore 

dd = adxl(a^ -\-x^). 

But, when d falls in the interval dd^ x falls in the corresponding interval dx. 
Hence the probability density for the distribution of x is 


<^(x) = 


2a 

7r(a* + X®) ’ 


This is also the relative frequency density of the distribution in § 6, Ex. 2, 
and the properties of the distributions are the same. There is symmetry 
about X = 0, and the s.d. is 0*62a. 


13. Theorems of Tchebychef and Bernoulli 

Consider first a theorem due to Tchebychef. Let x be a variate, 
either discrete or continuous, with s.d. cr. The theorem to be proved 
fixes an upper limit to the probability that a value of the variate. 
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ohoscn at random, will differ from the mean by more than A<r, 
where A is a given positive number. Tchebychefa theorem may be 
stated: 

In a random choice of a value of a variate^ whose standard deviation 
is o, the probability that the value chosen will di ffer from the mean by 
more than Acr does not exceed 1/A*. 

To prove it we observe that the variance o'* is the second moment 
of the probability of the whole distribution about the mean. Now, 
if the combined probability P of values further than Ao* from the 
mean were greater than 1/A*, the second moment of the probabihty 
of these values alone would exceed (A(7)*/A*, i.e. cr*, and that of the 
whole distiibution would, a fortiori, be greater than o’*. Since this 
is not so the statement in the theorem must be true. Thus 

( 21 ) 

In particular the probability that a value of the variate will diJBFer 
from the mean by more than 3or does not exceed 1/9. The theorem 
is a very conservative one, since the actual value of P is usually 
very much less than 1/A*. The result, however, applies to all distri- 
butions; and we shall now use it to prove a theorem due to 
Bernoulli. 

Let m be the number of successes obtained in n independent 
trials, in which the constant probability of the occurrence of the 
event A iap. The quotient m/n is the relative frequency of successes. 
How does this quotient behave as n increases indefinitely? It is a 
matter of common knowledge that, in trials of this nature in which 
the value of p is known, the relative frequency obtained is usually a 
close approximation to p when n is large. But there is no proof, 
based on the measure of probability given in § 7, that the relative 
frequency of successes will have p as limiting value when n tends to 
infinity. James Bernoulli, however, proved that, given any positive 
number e however small, the probability of j mjn — p | exceeding e 
tends to zero as n tends to infinity. This is expressed in modern 
terminology by saying that m/n converges in probability to p as n 


WMt 


s 
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tends to infinity. Bernoulli's theorem may also be expressed in the 
form: 

Let e and rj be two given positive numbers^ however small, and let m 
be the number of successes in n independent trials, in which the constant 
probability of success is p. Then the probability that the inequality 


m 

n 


<€ 


( 22 ) 


will hold is greater than ^—rj, provided that n is greater than a certain 
number N , depending on e and rj. 


A simple proof may be given as follows. It was shown in § 1 1 that 
the mean value of the relative frequency of successes in such a 
series of trials is p, and its variance cr^ is pq/n. If then we write 
€ = Act, the condition 



>6 


is the condition that the relative frequency of successes should 
differ from its mean by more than Acr. But, by Tchebychef 's theorem, 
the probability of this does not exceed 1/A*, so that 


P 



n 

ne^' 


and this is less than rj for all values of n greater than pqlye^- Hence 
Bernoulli’s theorem. 

The theorem does not assert that the inequality (22) must hold 
for all values of n greater than N, but that the probability of its not 
holding is less than rj. Even this probability, however small, leaves 
room for the possibility that it may not hold on some particular 
occasion. Bernoulli’s theorem explains the practice of taking mjn, 
for a large value of n, as an approximation to the value of p. Indeed, 
it frequently happens that this use of relative frequency is our only 
means of estimating the probability of the event. 


14. Empirical definition of probability 
As already indicated the definition of probability in terms of 
equally likely cases does not lend itself to every instance in which 
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a numerical evaluation of probability is desired. Another definition 
of the probability of an event is sometimes given in terms of the 
relative frequency of th e occurrence of the event in an extended 
series of trials. The fundamental assumption for such a definition 
is that this relative frequency, in a uniform series of trials, tends to « 
a definite hmit as the number of trials in the series tends to infinity; 
and this limit is taken as the measure of the probability of the 
occurrence of the event in another such trial. No proof can be given 
that the above relative frequency does tend to a limit. Convergence 
in probability, mentioned in connection with Bernoulli’s theorem, 
is a consequence of the original definition, and is not the same 
thing as the convergence to a limit assumed in the empirical 
definition. But, though the two approaches are not theoretically 
equivalent, they may be regarded as in agreement for practical 
purposes. 

By means of the assumption on which the empirical definition is 
based, the laws of addition and multiplication of probabilities can 
be deduced. Suppose, as in § 8, that a trial may result in any one of 
the mutually exclusive events A^. In a series of n trials let be 
the number of times in which the event haj)pens. Then the prob- 
ability of the happening of this event is given by 


p^ = lim 


rUi 


Since the number of times in which one of the events A^, A 2 , A • 

k 

happens is X the probability of the happening of one of these 

i - 1 


events is 


m, 


p = lim S i 


m. 


lim^^ = S Pi. 

1 ^ i- 1 


and we have the theorem of total probability as in § 8. 

Suppose next that A and B are different events that may happen 
as the results of specified trials T and T' respectively. We require 
the probability that, in a pair of such trials, both events will haj)pen. 
In a series of n pairs of trials let m be the number of times in which A 
happens, and the number of times in which both A and B happen. 
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These occasions are all included in the m occasions on which A 
happens. Then the probability of the happening of both events is 


P = 



: lira = ( lim -) ( lim — 

n->oo\^ \n-^ CO W/ \m-^ 00 ^ / 


since m tends to infinity with n, if the probability of ^ is not zero. 
Now limm/n is the probability of A, and llmmjm is the con- 
ditional probability of B on the assumption that A has happened. 
We thus have the theorem of compound probability as in § 8. In 
the particular case of independence the probability of the occurrence 
of the two events is simply the product of the probabilities of the 
separate events. As we have already seen, the theorems of total and 
compound probability are the foundations of the mathematical 
theory. The measure of probability by relative frequency is also 
fundamental. In the a priori definition it is the relative frequency 
of favourable cases in the total number of cases, while, in the 
empirical definition, it is the limit of the relative frequency of the 
happenings of the event under consideration. 


15. Moment generating function and characteristic function 
Let (j>{x) be the probability density in the distribution of the 
variate x. The expected value of is a function of t given by 

Af(/) = J ^(}>(x) dx, (23) 

where the integration is taken over the whole range of x. When this 
integral has a meaning for a certain range of values of /, we may 
expand the exponential and integrate term by term, thus obtaining 
the formula 

M(t) « + (24) 

where ~ J 

being the moment of order r about the origin {x « t)). For this 
reason the function M{t), defined by (23), is called the moment 
generating function (m.g.f.) of the distribution about the value 
a; ** 0. Similarly the m.g.f. about the value a; » a is defined 
as the expected value of exp[<(x— a)]. Denoting this by M^(t) 
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we have then 

Mai^) = Jexp[/(x-a)]95(z)cZa;, (25) 

provided the integral has a meaning; and the expansion of the 
exponential, and term by term integration, then show that the 
coefficient of r/rl is the moment of order r about the value x ^ a. 
Since the factor is independent of x, it follows from (25) that 

M,(t) » c--^J/o(0, (26) 

the subscript indicating the value with respect to which the m.g.f. 
is constructed. 

When the variate X takes only the discrete values x^ with prob- 
abilities (i = 1,2, ...,n), the m.g.f. with respect to the value 
wC = a, being the expected value of exp [t(x — a)], is given by 

-14(0 = 2 Pi exp [«(x, - a)] = e-o'J/JO (27) 

i 

as before; and the coefficient of r/r ! in the expansion is the moment 
of order r about the value in question. The m.g.f. will be found useful, 
not only in calculating the moments of a distribution, but also in 
leading to concise proofs of various theorems. Along with it may 
be mentioned the characteristic function* (c.f.), which is the 

expected value of where i ^ — 1 and t is real. 

Example 1. For the exponential distribution defined by <t>(x) = in 

which c is positive and x varies from 0 to C30, the m.g.f. with respect to the 
origin is 

/•oo 

M{t) = c j exp(tx-~cx)dx (|<|<c), 

1 / 2! , rl 

showmgthat = /^ = "i* 

c C* V 

Thus the mecm is 1/c, and the variance is given by 

/»• = /^'- (/»!)’ = ^. 

and the s.d. by <r = 1/c. 

• Cf. L4vy, 1926, 3, part ii, chapters n and in; Cramer, 1937, 6, 
pp. 23-68; and Kendall. 1943, 2, pp. 90-100. 
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Example 2. For the rectangular distribution^ ^(a?) = l/2a, —cKa;<a, 
we have 

e®* — e“** sinho/ 


Mo(0 


-_L f" 
2a J 


e^*dx = 


2a^ 


a/ 


= 1 + ! + aH^lb ! -f 1 + . . .. 


The moments of odd order are zero, and = a^^l{2r-^ 1). 

Example 3. For the binomial distribution the m.g.f. with respect to the 
origin is, in virtue of (27), 

M{t) = g" + 6**^2^ g**“*p* + ... 

= (g+pe*)" = (1 +p^ + ptV2i+p^*/3l4* 

The mean, being the coeflicient of t in the expansion, is np. Similarly 
being the coellicient of !, has the value 


np + 



2!p* = np[l -f-(n — l)p]. 


Consequently the variance is given by 


p, = Pa - (K)' = np(l -p) = npq. 


A very important property of the m.g.f. is expressed in the 
theorem; 

The moment generating function of the sum of two independent 
variates is the product of their moment generating functions. 

This is a direct consequence of the theorem concerning the 
expected value of the product of two independent variates. For, if 
X and y are independent variates, the m.g.f. of their sum with respect 
to the origin is 

= E{e^.e^y) = E{e^)E{efy), 

and is therefore the product of their m.g.f.’s. And, since the origin 
may be chosen at pleasure, the theorem holds for the m.g.f. ’s about 
any specified value. 

Let p,.. jiij. denote the moments of x about the mean and the origin, 
and m,., m'^. those of y. Then, by the above theorem, the m.g.f. of 
x^-y has an expansion obtained from the product 

The coefficient of t in this product is + mj, so that the mean of the 
sum of the variates is the sum of their means. From the coefficient 
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of t^l2 ! we find that the second moment of x-{-y about the origin is 
Consequently the second moment of x-hy about 

its mean is 

and is thus equal to the sum of the variances of x and y. Since the 
variance of —y is equal to that of y, we have again the theorem: 

The variance of the sum (or difference) of two independent variates 
is equal to the sum of their variances. 

From their definitions it is clear that the moment generating 
function, when it exists, and the characteristic function, which 
always exists, are determined by the probability density ^(x) 
of the distribution. Conversely, the distribution is determined 
uniquely by its characteristic function,* or by the moment gene- 
rating function when the moments satisfy certain conditions ;t 
that is to say, variates which have the same moment generating 
function conform to the same distribution. Proofs of these state- 
ments are beyond the scope of this book. In the three cases in 
which we shall make use of this converse theorem (pp. 49, 57-8 
and 151) the moments of the distributions satisfy the necessary 
conditions. 

16. Cumulative function of a distribution 
If the logarithm of the m.g.f. of a distribution can be expanded 
as a convergent series in powers of t, viz. 

K(t)^\ogM(t) 

= + (28) 

the coefficients, are called the cumulants (or seyninvariantsX) of 
the distribution, and K(t) is the cumulative function. The cumulants 

♦ For a proof of this converse theorem see L4vy, 1925, 3, pp. 166-7; 
also Deltheil, Erreura et Moindres Carres, pp. 26~9 (Fasc. 2, Tome 1 of 
TraiU du Calcul des Probabilities et de ses Applications, ed. E. Borel), Gauthier- 
Villars, Paris, 1930, and Kendall, 1943, 2, pp. 90-4. 
t Cf. Kendall, 1943, 2, pp. 105-10. 

t We adopt Fisher’s terminology of cumulants (1929, 1) rather than 
Thiele’s of seminvariants ( 1 903, 1 ), since Dressel has shown that the cumulants 
are only one particular set of seminvariants (Ann. Math. Stat. vol. xi, pp. 
36-57, 1940). 
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are determinate functions of the moments. For instance, on taking 
logarithms of both members of (24) and identifying with (28), we 
see that and is thus equal to the mean of the distribution. 

Since, by (26), the m.g.f. with respect to the mean is 

ifo(0exp(-/i;0, 

' it is clear on taking logarithms that the cumulative function with 
respect to the mean differs from that with respect to x =» 0 only 
by the addition of the term — Consequently the cumulative 
function relative to the mean is 

ic,/*/2 ! + ! + ...+ K^V^jr 1 + .... (29) 

And since this must be identical with 

log (I 1 I + ...)> 

we see, on comparing coefficients of like powers of t, that the first 
few cumulants are given in terms of the moments of the distribution 
by the formulae 

iCj = /tj = mean, 

= (30). 

Thus all the cumulants after the first are independent of the value 
with respect to which the cumulative function is constructed. The 
mean, and the moments about the mean, may therefore be found by 
calculating the cumulative function with respect to any convenient 
origin. Also, from the last of the relations (30), we have 

§ = = (31) 

and this is the excess of kurtosis, as defined at the close of § 6. 

Example. Since by § 15, Ex. 1, the m.g.f. of the exponenticd distribution, 
with cespect tg the origin, is 

the cumulative function is 

K{t) = -log{l-(rt) = + 

Thus jc, = <r(r-l)U 

The mean, being the coefficient of t, is cr. Tlie variance is (r*, /ig = 2a*, and 
= ^4 + 3/^ = (31 + 3)0^ s= 
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Further, since the m.g.f. of the sum of two independent variates, 
X and y, is equal to the product of their m.g.f/s, it follows that the 
cumulative function of the distribution of x -f y is the sum of those 
of X and y. Equating coeflScients of like powers of i, we have the 
simple result that the rth cumulant of x-hy is the sum of the rth 
cumulants of x and y. This is the additive property of cumularUs. The 
theorem is obviously true for the sum of any finite number of inde- 
pendent variates. The particular case of second cumulants gives again 
the theorem on the variance of the sura of several independent variates. 

The above property of cumulants may be used to estimate the 
average corrections* to be applied to the moments of a grouped 
distribution, with specified class interval c, when the interval-mesh 
is located at random on the ungrouped distribution. The corrections 
found, which are of the same form as Sheppard's adjustments, may 
be wrong in any individual instance, but their average effect in a 
large number of cases will be correct. In any class the mid-value x^, 
from which the moments of the grouped distribution are calculated, 
is the sum of the true value x of the observation and the grouping 
error x^ — x. In consequence of the random location of the class 
Umits, the grouping error is uniformly distributed over the range 
— Jc to \c. The average cumulants of the grouped distribution 
will differ from those of the ungrouped by the cumulants of the 
grouping error. Now the m.g.f. for the uniform distribution of 
error is 

M{t) = - 8inh(ic0 

c» c* ^ 

and its cumulative function is therefore 

If then are the cumulants of the grouped distribution, those of 
the ungrouped distribution are 

• Cf. Cornish and Fisher, 1937, 7, pp. S-4, and Kendall, 1943, 2, pp. 74r-5. 
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Denoting the moments of the grouped distribution by and m', 

we therefore have for those of the ungrouped distribution 

= = /i3 = mt (32) 

j / c* « / /« c* 

and + 34 = + 3 ^ 

7^4 

= + (33) 

The equation (32) shows that the estimate m[ of the mean, and the 
estimate of the third moment about the mean, as found from the 
grouped distribution, are sufficiently accurate. The calculated 
variance wig should be diminished by c^/12 ; while the adjustment to 
the fourth moment is given by (33). 
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EXAMPLES II 

1 . Two cards are drawn at random from a well-shuffied pack of 
52. Show that the chance of drawing two aces is 1/221. 

2. The chance of throwing a 6 at least once in two throws of a 
die is 11/36. 

A and B toss a coin alternately on the understanding that^ 
the first to obtain heads wins the toss. Show that their respective 
chances of winning are 2/3 and 1/3. 

4 . Four persons are chosen at random from a group containing 
3 men, 2 women and 4 children. The chance that exactly two of 
them will be children is 10/21. 
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5. From an um containing r red and a white balls, a -f 6 balls are 
drawn at random without replacement (a ^ r, 6 ^ a). Show that the 


probability of a red and b white balls is 




6. Show that, in a single throw with two dice, the chance of 
throwing more than 7 is equal to that of throwing less than 7, 
each being 5/12. 


A and B take turns in throwing two dice, the first to throw 9 
being awarded the prize. Show that their chances of winning are 
in the ratio 9 : 8. 


8. Three men toss in succession for a prize to be given to the one 
who first obtains heads. Show that their chances of winning are 
4/7, 2/7 and 1/7. 

9. Eight coins are thrown simultaneously. Show that the chance 
of obtaining at least six heads is 37/256. 

10. The expectation of the number of failures preceding the first 
success in an indefinite series of independent trials, with constant 
probability p of success, is 

2P + 2</V + 3?='P + ••• = = |. 


(Uspensky, 1937, 2, p. 178, Ex. 3.) 

11. A point P is taken at random in a line AB, of length 2a, all 
positions of the point being equally likely. Show that the expected 
value of the area of the rectangle AP ,PB is 2a73, and that the 
probability of the area exceeding is 1/^2. 


12 . From a point on the circumference of a circle of radius a, 
a chord is drawn in a random direction (i.e. all directions are equally 
likely). Show that the expected value of the length of the chord 
is 4 a/ 7 r, and that the variance of the length is 2 a 2 (l — 8 / 77 ^). Also 
show that the chance is 1/3 that the length of the chord will exceed 
the length of the side of an equilateral triangle inscribed in the 
circle. 


13. A chord of a circle of radius a is drawn parallel to a given 
straight line, all distances from the centre of the circle being equally 
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likely. Show that the expected value of the length of the chord is 

jTra, and that the variance of the length is — (32 — Stt*). Also show 

1 1 $ 

that the chance is 1/2 that the length of the chord will exceed the 
length of the side of an equilateral triangle inscribed in the circle. 

14. Two diflFerent digits are chosen at random from the set 

1 , 2, 3, 8. Show that the probability that the sum of the digits 

wiU be equal to 6 is the same as the probability that their sum will 
exceed 13, each being 1/14. Also show that the chance of both digits 
exceeding 6 is 3/28. 

15. In Poisson’s distribution the variate takes the values 
0, 1, 2, 3, ... with probabilities proportional to 1, m, 771^/2!, m^/3!, ..., 
both sequences being infinite. Show that the mean of the distribu- 
tion is m, and the variance also m. 

16. The equation § 15(26), written fora = 5 = /tj, is equivalent to 

(1 4 - ...) 

Equating coefficients of like powers of deduce the relations 
§4(17), (18). 

17. Prove that the next two formulae corresponding to § 16 (30) 
are 

18. Two independent variates are each uniformly distributed 
within the range —a to a. Show that their sum x has a probability 
density given by 

^(x) =* (2a + x)/4a* ( — 2a<x<0), 

<l>{x) = (2a — x)/4a* (0 < x ^ 2a). 

Verify that the m.g.f., calculated from this value of ^(x), is equal 

to i^sinha^ 

\(U 

19. On the x-axis n-h 1 points are taken independently between 
the origin and x all positions being equally likely. Show that 
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the probability that the (i-f l)th of these points, counted from the 
origin, lies in the interval x — \dx to a; + \dx is 



(n + 1 ) ( 1 — a;)"-* da;. 


Verify that the integral of this expression, from a; = Otoa;=l,is 
unity. (Aitken, 1939, 1, p. 71.) 

20. Show that, for the binomial distribution, 

^4 = npg(l-6pg). 

21 . Show that the expected value of the product of the numbers 
of points showing after an unbiased throw of n ordinary dice is (7 /2)’^. 

22. Show that, if p may be varied, the probability of m 
successes in a series of n independent trials, with the same prob- 
ability p of success, is greatest when p = mjn. 

23. Show that, if y and z are independent random values of a 
variate a;, the expected value of {y — z)^ is twice the variance of the 
distribution of x, 

24* Defining the harmonic mean (h.m.) of a variate x as the 
reciprocal of the expected value of l/x, show that the H.M. of the 
variate which ranges from 0 to oo with probability density x^er^jn 1 
is n, given that n is positive. 
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SOME STANDARD DISTRIBUTIONS 
Binomial and Poissonian Distributions 
17. The binomial distribution 

We have considered the binomial distribution in connection 
with the probabilities of the various numbers of successes in a 
series of n independent trials, in each of wliich the chance of success 
is equal to p. The mean and the variance of the distribution were 
determined by means of the property that the expected value of a 
sum of variates is equal to the sum of their expected values. These 
may also be found by direct calculation. Thus 

E{x) = 0 . g” + 1 . -h 2 . j qn -~2p2 4- . . . -f np^ 

= np -f (?i - ... 

= np{q +p)^~^ — np. (1) 

Thus the mean of the distribution is np. In order to find the variance 
calculate first the second moment about x = 0, Thus 

/i' = 0^ . 4- 1* . nq^~^p 4- 2^ . Q j q^-^p^ 4. ... 4- 7^2 

= nj? 4- 2(7i — l)q^-^p-}-3 2 ... 4-np^“^ 

Now the exy^ression in brackets is the first moment, about the value 
X = — 1 , for the binomial distribution in which n is replaced by 
n — ]. This first moment, being the excess of the mean of the dis- 
tribution above a; = — 1, is equal to (?i — l)p-f 1, in virtue of (1). 
Consequently 

/^2 ^ np[{n-\)p-^\], 

and the variance of the binomial distribution is then given by 

/^2 = = np[l4-(n-l)i>]-(n2})* 

np(\-p) ^npq. 


( 2 ) 
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and the standard deviation by 

<r == ^{npq), (3) 

The proportion or relative frequency of successes is the number of 
successes divided by n. Hence the mean value of this proportion 
is p, and its standard deviation is ^j('pqjn). 

A binomial frequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, w of the variate are equal to 
their probabilities in the above distribution. As an example we may 
take the distribution of expected frequencies of 0, 1, n successes 
when the set of n trials is to be made N times. For, in virtue of (1) 
and the known probability of r successes, the expected frequency of 

r successes in N sets of n trials each is N The properties 

of the distribution are the same whether it is regarded as one of 
probability or one of frequency. But the reader may note that, in 
a theoretical frequency distribution like the above, the individual 
frequencies are not necessarily inte;:]:ral. 

Example 1. Verify the above value of by direct algebraical simpli- 
fication of the expression. 

Example 2. Show that the m.g.f. of the binomial distribution with respect 
to its mean is 

l+P9^+P3(9’-J5’) — + ••• , 

and deduce that 

= npq, /t, = npq{q-p), = npq[\ + 3(n- 2)pgr]. 


18. Poisson’s distribution 

An important distribution, associated with the name of Poisson, 
is one obtainable from the binomial distribution by putting = m/n, 
where m is a constant,and letting n increase indefinitely. Thus the 
number of trials in the series becomes very large, and the prob- 
ability of success in a trial very small. Now it can be shown* that, 
on the above assumption as n tends to infinity, 


Um 



-r 


m^er^ 

" rT • 


• See Mathematical Note I, at the end of this chapter. 
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Thus, in the limiting form of the distribution, the probability of r 
successes in the infinite series of trials is The chances of 

0, 1, 2, 3, ... successes in the infinite series are 

... (4) 

respectively. The reader may show, as in the case of the binomial 
distribution, that the most probable number of successes is the 
integral part of m. It is obvious that the sum of the probabilities 
of 0, 1, 2, ... successes is = 1, as it should be. 

The mean value and the variance of Poisson’s distribution may be 
deduced from those of the binomial by putting p = min, and 
letting n tend to infinity. Thus the mean value is lim np = m. 
Similarly, the variance is lim npq = lim mq = m, since q tends to 
unity as p tends to zero. These results may, of course, be deduced 
by direct calculation. Thus the mean, being the first moment about 
X n 0, is given by 

e“«(0 + l.m + 2.mV2! + 3.mV3! + ...) 

*= m. (5) 

Similarly 

= e-^(0a+ 1* . m + 22 . ! + 3^ . m^/Z ! -f ...) 

- m(m+l), 

as the reader may easily verify. Consequently 

or* = =» — — m(m+ 1) — m* 

« m, ^ (6) 

as found above. The s.d. is therefore ^m. 

The same results may be obtained by using the generating 
functions of §§15 and 16. Thus the m.g.f. of the Poissonian dis- 
tribution with respect to the origin is 

Jfo(0 = e-^(e^ ^ehn-\- e%i*/2 1 + e^^jZ ! + ...) 

= e”^exp (me^) = exp [m(e^— 1)], 

and the cumulative function is therefore 


K{t) « m{ef-l) - + 
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Thus e» *» j =» =» m. The mean and the varianoe are each 

equal to m, as is also the third moment about the mean. The fourth 
moment about the mean is 

% 

H'a *= #^4+3^5 =» m + 3m*. 

Further, if n independent variates conform to Poissonian 

distributions with means {% » 1,2, it follows from the 

theorem of § 16 that the m.g.f. of their sum is exp |^(e*— 1) 2 • 

But this is the m.g.f. of a Poissonian distribution whose mean is 
Hence the theorem: 

i 

The sum of any finite number of independent Poissonian variates 
is itself a Poissonian variate^ with mean equal to the sum of the means 
of the separate variates.^ 

"A Poissonian frequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, 3, ... of the variate are equal to the 
probabilities in the above distribution. As an example we have the 
distribution of the expected frequencies of the various numbers of 
successes when the extensive series of trials is repeated N times. 
But in such a theoretical distribution the individual frequencies 
are not necessarily integral. Frequency dis tributions which are 
approximately Poissonian do arise in connection with the number 
of happening of a rare event in an exteHsiviFseri^ of trials. 

Example, In 1 ,000 consecutive issues of The Utopian Seven-daily Chronicle 
the deaths of centenarieLns were recorded,! the number x having frequency 
/ according to the table 

«:0 1 2 846678 

/: 229 326 267 119 50 17 2 1 0 

Show that the distribution is roughly Poissonian by calculating its mean 
(m = 1*6), and then the frequencies in the Poissonian distribution with the 
same mean and the same total frequency of 1,000. The latter are approxi- 
mately 223 1, 334*7, 261*0. 126*6, 47*1, 14*1, 3*6. 0*8, 0*2. Also calculate the 
variance of the given distribution, and compare it with the mean. 

* An elementary proof of this theorem will be found in Ex. 4 at the end 
of this chapter. 

t Cf. Lucy Whittaker, Biometriha^ vol. x, 1914, p. 36. 


WMt 
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[III 


The Normal Distribution 


19. Derivation from the binomial distribution 


The binomial and Poissonian are distributions of discrete values. 
We pass now to a continuous distribution of fundamental import- 
ance. This is the normal distribution, which may be derived from the 
binomial in the following manner. In the latter the probability of 


the value r of the variate ^ ^ ^ j Let x be the deviation of 

the variate from the mean value np of r, so that 


r = np-\-x. (7) 

Then the probability of the value x, being the same as that of the 
corresponding value of r, is 


P = 




nnv^XQnq-x 


It can be shown^ that, for large values of n, this probability can be 
expressed ^ 








( 8 ) 


where e tends to zero as n tends to infinity, provided that neither 
p nor q is very small, and x is of lower order than 
Now introduce a variable z defined by 


X 



and examine the probability distribution of z, and its limiting form 
as n tends to infinity. The probability for any interval in the range 
of z is equal to that of the corresponding interval for x. To unit 
interval in the range of x there corresponds the interval Ij^n in 
that of z; and, when n tends to infinity, this may be denoted by dz. 
Then the probability that z falls in the interval dz is the probability 
that X falls in unit interval which includes the value x\ and this is 
given by (8) which, when n tends to infinity, takes the form 

* For the details of this stop see Mathematical Note U, at the end of this 
chapter. 



19, 20] Normal Distribution 51 

The limiting form of the distribution of z is thus a continuous dis- 
tribution with probability density 

where cr^ = pq. This is the normal probability function, and the 
corresponding continuous distribution is the normal distribution. 
Since ^j{npq) is the s.d. of the binomial distribution, and therefore 
of the variate z, it follows that \j{pq) is the s . d . of 2 ;. Hence a in ( 9 ) 
is the S.D. of the normal distribution. Since (l){z) is an even function 
of z, the distribution is symmetrical about 2: = 0 , which is therefore 
the mean value of the distribution. Also, since z must lie between 
— 00 and 00, the integral of ( 9 ) between these limits is equal to 
unity, so that 

= ( 10 ) 

This is an important integral. An independent proof of the formula 
is given in Mathematical Note III, at the end of this chapter. 

A normal frequency distribution is a continuous one, in which 
the relative frequency density f{x) is identical with the function (j)[z) 
defined by ( 9 ). A distribution (^ discrete values cannot b e normal . 

however, the total frequency is large, it is possible for the discrete 
distribution to approximate to the normal. The meaning of such an 
approximation was explained in § 6 . 

In later chapters, particularly in connection with sa'mpling theory, 
we shall meet distributions which are accurately normal, and others 
which are approximately so. Since the normal distribution is an 
ideal to which some distributions attain, and to which others 
approximate, it is important to know its properties. Fortunately, 
the quantities associated with the distribution are easy to calculate. 

20. Some properties of the normal distribution 

As we have just seen, a continuous variate x is normally dis- 
tributed, with mean zero and s.d. cr, when the range of the variate 
is from — 00 to 00, and the probability density is given by 

4-i 


( 11 ) 
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The probability o jrve is therefore the curve 



Thia curve is symmetrical about the line a: *» 0 through the mean of 
the distribution. It^is a uni-modal curve, in which the ordinates 
decrease rapidly as | a; | increases. By equating to zero the second 
derivative of y, the reader will easily verify that the points of in- 
flexion on the curve are given by x = ± cr. The ordinates of the 
normal curve are given in Table 1, corresponding to values of x/cr 
at intervals of 0-01. 



0 ^ 
Fio. 3. The Normal Probability Curve 


Since the distribution is symmetrical, the moments of odd order 
about the mean are all zero. To find the momenta of even order we 
observe that the moment of order 2n about the mean is 




Integration by parts then gives 




The expression in square brackets vanishes at both limits; and we 
may therefore write the relation 


= (2n- 
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Tabus 1. Ordinates of the Normal Curve 


The origin of x ia at the mean. The table gives the values of -ri-: exp ( 

, ■v(2^) ^ \ 2iG* J 

which 18 <7 times the ordinate of the normal curve y = <p{x) 


X 

<T 

000 

001 

002 

003 

004 

006 

006 

007 

008 

009 

0*0 

•3989 

•3989 

•3989 

•3988 

•3986 

•3984 

•3982 

•3980 

•3977 

•3973 

0 1 

•3970 

•3965 

•3961 

•3956 

•3951 

39 45 

•3939 

•3932 

•3926 

•3918 

0 2 

•3910 

•3902 

•3894 

•3885 

•3876 

•3867 

•3857 

•3847 

•3836 

•3826 

0*3 

•3j<14 

•3802 

•3790 

•3778 

•3765 

•3752 

•3739 

•3725 

•3712 

•3697 

0*4 

•3683 

•3668 

•3653 

•3637 

•3621 

•3605 

•3589 

* / 2t 

•3555 

•3538 

0 5 

•3521 

•3503 

•3485 

•3467 

•3448 

3429 

•3410 

•3391 

•3372 

■3352 

0 0 

•3332 

•3312 

•3292 

•3271 

•3251 

•3230 

•3209 

•3187 

•3166 

•3144 

0-7 

•3123 

•3101 

•3079 

•3056 

•3034 

•3011 

•2989 

•2966 

•2943 

•2920 

0-8 

•2897 

•2874 

•2850 

■2827 

•2803 

•2780 

•2756 

•2732 

•2709 

•2685 

0*9 

-2661 

•2637 

•2613 

•2589 

•2565 

•2541 

'2516 

•2492 

•2468 

•2444 

1-0 

•2420 

-2396 

•2371 

•2347 

•2323 

•2299 

•2275 

•2251 

•2227 

•2203 

11 

•2179 

•2155 

•2131 

•2107 

•2083 

•2059 

•2036 

•2012 

•1989 

•1965 

12 

•1942 

•1919 

•1895 

•1872 

•1849 

•1826 

•1804 

-1781 

•1758 

•1736 

1*3 

•1714 

•1691 

•1669 

•1647 

•1620 

•1604 

•1582 

•1561 

•1539 

•1518 

1*4 

•1497 

•1476 

•1456 

•1435 

•1415 

•1394 

•1374 

•1354 

•1334 

•1316 

1*6 

•1295 

•1276 

•1257 

•1238 

♦1219 

•1200 

•1182 

•1163 

•1145 

•1127 

16 

•1109 

•1092 

•1074 

•1057 

•1040 

•1023 

•1006 

•0989 

•0973 

•0967 

17 

•0940 

•0925 

•0909 

•0893 

♦0878 

•0863 

•0848 

•0833 

•0818 

•0804 

1*8 

•0790 

•0775 

•0761 

•0748 

•0734 

•0721 

•0707 

•0694 

•0681 

•0669 

1*9 

•0656 

•0644 

•0632 

•0620 

♦0608 

•0596 

•0584 

•0573 

•0662 

•0551 

2 0 

•0540 

•0529 

•0519 

•0508 

•0498 

■0488 

•0478 

•0468 

•0469 

•0449 

2 1 

•0440 

•0431 

•0422 

•0413 

•0404 

•0395 

•0387 

•0379 

•0371 

•0363 

2-2 

•0355 

•0347 

•0339 

•0332 

•0325 

•0317 

•0310 

•0303 

•0297 

•0290 

2 3 

•0283 

•0277 

•0270 

•0264 

•0258 

•0252 

•0246 

•0241 

•0235 

•0229 

2*4 

•0224 

•0219 

•0213 

•0208 

•0203 

•0198 

•0194 

•0189 

•0184 

•0180 

2 5 

•0175 

•0171 

•0167 

-0163 

•0158 

•0154 

0151 

•0147 

•0143 

•0139 

2 6 

0136 

•0132 

•0129 

•0126 

0122 

•0119 

•0116 

•01 13 

•01 10 

•0107 

2 7 

• 0 ln 4 

•0101 

•0099 

-0096 

0093 

•0091 

•0088 

•0086 

•0084 

•0081 

2 8 

•0079 

-0077 

•0075 

•0073 

•0071 

•0069 

•0067 

•0065 

•0063 

•0061 

2 9 

•0060 

•0058 

■0056 

•0055 

-0053 

•0051 

•0050 

•0048 

•0047 

•0046 

30 

•0044 

•0043 

•0042 

•0040 

•0039 

• 0031 ^ 

•0037 

•0030 

•0035 

•0034 

3 1 

0033 

■0032 

•0031 

•0030 

• 0 o 29 

• 0(»28 

•0027 

• 0026 - 

•0025 

•0026 

3 2 

•0024 

0023 

•0022 

•0022 

•0021 

•0020 

•0020 

•0019 

•0018 

•0018 

3 3 

0017 

•0017 

•0016 

•0016 

•0015 

•0015 

•0014 

■0014 

•0013 

•0013 

3 4 

0012 

•0012 

•0012 

•001 1 

•0011 

•0010 

•0010 

•0010 

•0009 

•0009 

3 5 

•0009 

•0008 

•0008 

•0008 

•0008 

•0007 

•0007 

•0007 

•0007 

•0006 

3 6 

•0006 

•0006 

•0006 

• 0 i >05 

•( K ‘05 

•0005 

•0005 

•0005 

•0005 

•0004 

3 7 i 

•0004 

•0004 

•0004 

•0004 1 

i ■ 00 ^ l ^ 

• 0 o 04 

•0003 

•0003 

•0003 

•0003 

3 8 

•0003 

• 0 ou 3 j 

•0003 

• 0 O '‘3 1 

! 0003 

•( m >02 

•0002 

■0002 i 

•0002 

-0002 

3 9 

•0002 j 

- 0 oo 2 

•0002 

•0002 

© 

c 

c 

0002 

•0002 

•0002 

•0001 

1 -0001 

1 
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Repeated application of tjiis reduction formula shows that 

Hn = (2n~l)(2n-3)...3.1.or2"/4o- 

But, for any distribution, /4 q = 1, and therefore 


/^ 2 n = (2^ - 1 ) (2n ~ 3) . . . 3 . 1 . (13) 

In particular, the variance is and the s.d. is o*, as proved above. 
Similarly, 

/44 = 3(r^. (14) 


The moments may also be obtained from the moment generating 
function of the normal distribution. This function, relative to the 
mean a: = 0, is 


M(t) = 


exp 


(15) 


in virtue of (10). Since the expansion of this expression involves 
only even powers of i, the moments of odd order about the mean are 
zero, as is obvious from symmetry. The moment of order 2n is the 
coefficient of t^^l{2n ) ! in the expansion; and this has the value 

/^2n= (i<^T(2ri)!/w! = 1.3.5... (2/i-l)cr2»» 

as found above. The cumulative function is, in virtue of (15), 

K(t) - log if (0 = ^(rH\ 

so that all the cumulants after the second are equal to zero. 

The mean deviation from the mean is easily calculated. Its value is 

♦ rco 

J lxl^(x)dx ■» 2j x^{x)dx 


in virtue of symmetry; and this integral 


sa (T / - = 0-7979cr = approximately. 

7T 
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In a normal distribution with mean x and s.d. <r, x — x is the 
deviation of the variable from the mean, and the probability den- 



Conversely, a probability density of this form defines a normal 
distribution with mean x and s.d. (T. 

21. Probabilities and relative frequencies for various intervals 
The probability corresponding to any interval in the range of 
the variate is represented by the area under the curve (12) within 
that interval; and the same is true for the relative frequency in the 
case of a normal frequency distribution. In particular, the prob- 
ability for the interval from the mean (zero) to the value x is given 
by the integral 

Putting t — xfcr we see that this is equivalent to 

This probability is therefore a function of xjor. In the accom- 
panying table the values of this integral are given for different 
values of xja at intervals of 0*01. 

Using Table 2 the reader will see that, if xjo- ~ 1, the area is 
0*3413. The area within tlic interval x ~ —cr to x = <T is therefore 
0-G826, and the area outside this interval 0*3174, which is less than 
1/3. In other words, the probabihty that a random value of a normal 
variate will deviate more than cr from the mean, is less than 1/3. 
Similarly, for xja = 2 the value of the integral (17) is 0*4772, and 
the area for the interval x = — 2cr to x = 2cr is 0*9544. The area 
outside this interval is thus 0*0456. The probability that a random 
value of X will deviate more than 2(r from the mean is thus about 
%. Similarly, the area outside the range x = — 2*5or to x = 2*6<r 
is 0*0124, wdiich is about 1^ % of the whole; and that outside the 
range x = — 3cr to x = 3o* is only 0*0027, or about \ % of the whole. 
The reetder may verify, and will find it convenient to remember, 
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that the deviation from the mean which is exceeded with a prob- 
ability of 6% is l'96(r; and that which is exceeded with a 
probability of 1 % is 2-58(r. 

Tabus 2 . Arta under the Normal Ourve 


The area is measureti from the mean, a; » 0, to any ordinate, s a (9, 
The results ato given for values of xfo at intervals of 0*01 


9 

a 

0*00 

001 

002 

003 

004 

006 

006 

007 

008 

009 

00 

•0000 

•0040 

•0080 

•0120 

•0159 

•0199 

•0239 

•0279 

•0319 

•0359 

0 1 

•0398 

•0438 

•0478 

•0517 

•0557 

•0596 

•0036 

•0675 

•0714 

•0753 

0*2 

•0793 

•0832 

•0871 

•0910 

•0948 

•0987 

•1026 

•1064 

•1103 

•1141 

mwm 

•1179 

•1217 

•1255 

1293 

•1331 

•1368 

•1406 

•1443 

•1480 

•1517 


•1564 

•1591 

•1628 

•1664 

•1700 

•1736 

•1772 

•1808 

• r 844 

•1879 

0*6 

•1915 

•1950 

•1985 

•2019 

•2054 

•2088 

•2123 

•2167 

•2190 

•2224 

0 6 

•2257 

•2291 

•2324 

•2357 

•2389 

•2422 

•2454 

•2486 

•2518 

•2549 

0-7 

•2580 

•2611 

•2642 

•2673 

•2704 

•2734 

•2764 

•2794 

•2823 

•2852 

0*8 

•2881 

•2910 

•2939 

•2967 

•2995 

•3023 

•3051 

•3078 

•3106 

•3133 

0*9 

•3159 

•3186 

•3212 

•3238 

•3264 

•3289 

•3315 

•3340 

•3366 

•3389 

1*0 

•3413 

•3438 

•3461 

•3485 

•3508 

•3531 

•3554 

•3577 

•3599 

•3021 

M 

•3643 

•3665 

•3686 

•3708 

•3729 

•3749 

•3770 

•3790 

•3810 

•3830 

12 

•3849 

•3869 

•3888 

•3907 

•3925 

•3944 

•3962 

•3980 

•3997 

•4016 

1*3 

•4032 

•4049 

•4066 

•4082 

•4099 

•41 15 

•4131 

• 4 U 7 

•4162 

♦4177 

1*4 

•4192 

•4207 

•4222 

•4236 

•4251 

•4205 

•4279 

•4292 

•4306 

•4319 

1*6 

•4332 

•4345 

•4357 

•4370 

•4382 

•4394 

•4406 

•4418 

•4430 

•4441 

16 

•4452 

•4463 

•4474 

•4485 

•4496 

•4505 

•4515 

•4525 

•4535 

•4545 

1*7 

•4554 

•4564 

•4573 

•4582 

•4591 

•4599 

•4608 

•4616 

•4625 

•4633 

18 

•4641 

•4649 

•4656 1 

•4664 

•4671 

•4678 

•4686 

•4693 

•4699 

•4706 

1*9 

•4713 

•4719 

•4726 

•4732 1 

•4738 

•4744 

•4760 

•4756 

•4762 

•4767 

20 

•4772 

•4778 

•4783 

•4788 

•4793 

•4798 

•4803 

•4808 

•4812 

•4817 

2 1 

•4821 

•4820 

•4830 

•4834 1 

•4838 

•4842 

4840 

•4850 

•4854 

•4857 

2*2 

•4861 

•4865 

• 48(»8 

•4871 

•4875 

•4878 

•4881 

•4884 

•4887 

•4890 

2 3 

•4893 

•4896 

•4898 

•4901 

•4904 

•4900 

•4909 

•4911 

' -4913 

•4916 

24 

•4918 

•4920 

•4922 

•4925 

•4927 

•4929 

•4931 

•4932 

•4934 

•4936 

2*6 

•4938 

•4940 

•4941 

•4943 

•4945 

•4946 

•4948 

•4049 

•4951 

•4952 

2 6 

•4953 

•4955 

•4956 

•4957 

•4959 

•4900 

•4901 

•4962 

•4963 

•4964 

2 7 

•4965 

•4966 

•4967 

•4968 

• 496,9 

•4970 

•4971 

•4972 

•4973 

•4974 

2 8 

•4974 

•4975 

•4976 

•4977 

•4977 

•4978 

•4979 

•4980 

•4980 

•4981 

2 0 

•4981 

•4982 

•4983 

•4983 

•4984 

•4984 

•4985 

•4985 

•4986 

•4986 

3*0 

•49865 

•4987 

•4987 

•4988 

•4988 

•4989 

•4989 

•4989 

•4990 

•4990 

8 1 

•49903 

•4991 

•4991 

•4991 

•4992 

•4992 

•4992 

•4992 

•4993 

•4993 


From Table 2 we may find the quartiles of the normal distribu- 
tion with mean at x = 0. The relative frequency from the mean to 
the upper quartile is 0*25, which is therefore the corresponding area 
under the curve. Interpolating with the aid of Table 2 we find 
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x/a- = 0-6745, and the upper quartile is therefore 0-6746tr. Simi- 
larly, the lower quartile is — 0-6745£r, 

Example 1. Using Table 2 show that the 6th, 7th, 8th and 9th deciles 
0*263cr, 0*625 <t, 0-842(7 and 1*282(7 respectively. 

Example 2. For a normal distribution, with mean £ = 1 and s.D. 3, find 
the probabilities for the intervals 

(i) X = 3*43 to 35 = 6*19, (ii) a; = — 1*43 to a; = 6*19. 

Let x' denote the deviation from the mean. Then, in the first part of the 
example, the values of x'fa for the boun<ls of tlie interval are 0-81 and 1*73. 
The areas from the mean to the bounds of the interval are 0-2910 and 0-4682. 
The area corresponding to the interval is the difference of these areas, 6uid 
is therefore 0*1072. 

In tlie second part the vahiea of x'/cr corresponding to the bounds of the 
intf^rval are —0*81 and 1-73. The area from the lower bound to the mean is 
0-2910, and that from the mean to the upper bound is 0*4682 as before.. 
The required area is the sum of those, viz. 0*7492. 


22. Distribution of a sum of Independent normal variates 

Consider the normal variate x, with mean a and variance cr^. 
Its m.g.f. with respect to the origin is 

m . E(.") - 

= exp {at -h lcrH^)y (18) 

in virtue of (10). The cumulative function is 
K{t) = log M {t) = at 4- 

If c is a constant, cx is normally distributed with mean ca and 
variance cV^. Hence the m.g.f. of the distribution of cx is 

exp {cat + |c W). 

Let x^ {i = 1, ...,n) be n independent normal variates, with means 
and variances orj. Then, by the theorem of § 15, the m.g.f. of the 
sum 

exp [i S S efo-f]. 
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But this is the m.g.f. of a normal distribution with mean and 

variance Hence the theorem: 

If the independent variates x^{i 1, ...,n) are normally distributed 
vnth means a^ and variances (t\, the variate is normally dis- 
tributed with mean variance 

In particular, the sum (or the difference) of two independent 
normal variates is normally distributed, with variance equal to 
the sum of their variances. Also, if in tlie above theorem we put 

== (Ti = (7, = l/n (i = 1, 2, 

we have the important result: 

If the independent variates x^{i = 1, ...,n) are normally distributed 
about a common mean, a, ivitli a common variance, a^, their mean is 
also normally distributed about a, but with variance cr^jn. 

Other proofs of the above theorem will be found in Ex. 8 at the end 
of this chapter. 
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EXAMPLES III 

1 . Show by direct calculation that the third moment of the 
binomial distribution about a; = 0 is 

/^3 = r?p[(n~l)(n~2)p*-f 3(n~l)p+l], 
and deduce that = npq{q—p). Similarly, show that 
A4 =»^Ml + 3(n-2)pgf]. 
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2. Show that, for Poisson’s distribution, the third and fourth 
moments about a: = 0 are given by 

and deduce that 

/^3 = wi, — 3m^ -f m. 

Deduce these results also as limiting values of the corresponding 
moments in Ex. 1. 

3. In a normal distribution, whose mean is 2 and s.d. 3, find a 
value of the variate such that the probability of the interval from 
the mean to that value is 0*4115. {Arts, x — 6*05.) Find another 
value such that the probability for the interval from a: = 3*5 to 
that value is 0*2307. [Ans, x = 6*26.) 

4. If two independent variates ^ x and y, have Poissonian distribu- 
tions with means and mg, their sum is a Poissonian variate with 
mejanm^-\-m^. (Cf. §18.) 

The variates take only the values 0, 1, 2, 3, .... We require the 
probability that their sum will take the value r. The probability 
that simultaneously x will have the value s and y the value r — 5 is 


mj exp ( — mj) 
s! 


^ (r-fi)! 


Summing for all values of s from 0 to r, we see that the probability 
that cr-f y will have the value r is 


exp ( — mi mg) 

8 




{m^ + mg^ exp ( — mi — mg) 
r! 


Consequently x + y is a Poissonian variate with mean mj + mg. It 
follows that, if the independent variates x^ have Poissonian distribu- 
tions with means m^ (i = 1, ...,n), their sum is a Poissonian variate 
with mean 2 

5. Consider the normal distribution which has the same mean 
(3*95 lb.) and the same s.d. (0*46 lb.) as the distribution of yields 
of grain in Ex. I, 2. Find the relative frequencies of the normal 
distribution for the intervals corresponding to the various classes, 
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and deduce the class frequencies per 600 of the normal distribution. 
This process is sometimes spoken of as fitting a normal distribution 
to the data. 

Take, for example, the class whose limits are 4*1 and 4*3 lb. 
Since the mean is 3*95 we have, for the lower limit, z/cr = 0*15/0*46 
=a 0*3261. The area from the mean to this limit is therefore 0*1278, 
and the area to the right of it is 0*3722. Similarly, for the upper 
limit xjcr = 0*35/0*46 = 0*7609; and the area to the right of this is 
0*2233. The difference of the areas to the right of the limits is that 
corresponding to the interval. Its value is 0* 1489. This is the relative 
frequency for that interval. 

The work can be tabulated as below. Here x denotes the deviation 
of a class limit from the mean 3*95 lb. Knowing that <t = 0*46 lb. 
we can find xja for each limit. The third column gives the area under 
the normal curve, to the right of the class limit. The differences, 
recorded in the next column, are the relative frequencies for the 
various classes. These differences, multiplied by 500, give the normal 
frequencies of the classes per 500 of the total. The last column 
records the class frequencies in the given distribution. The lower 
limit of the lowest class is taken as — oo, and the upper limit of the 
highest class as oo, so as to include the whole of the normal distribution. 


Fitting a normal distribution to that of 600 yields of grain 


Class 

limit 

x/or 

Area 
to right 

Difference 

d 

600(1 

Observed 

/ 

— 00 

— OO 

1-0000 

0-0112 

6-6 

4 

2*9 

- 2-2826 

0 9888 

0-0212 

10-6 

15 

3 1 

-1-8479 

0-9676 

0-0464 

23 2 

20 

3 3 

-1-4130 

0-9212 

0 0851 

42 6 

47 

3 5 

-0-9783 

0 8361 

0-1295 

64 7 

63 

3-7 

-0-5435 , 

0-7066 

0 1633 

81-6 

78 

3*9 

-0-1087 

0 5433 

0 1712 

85 6 

88 

41 

0-3261 

0 3722 

0 US9 

74 4 

69 

4-3 

0-7609 

0 2233 

0 1073 

63-7 

69 

4-5 

1-1956 

0 1160 

0 0644 

32 2 

35 

4-7 

1-6304 

0 0516 

0-0322 

16 1 

10 

4-9 

2-0652 

0-0194 

0 Oi:i2 

6 6 

8 

61 

00 

2-5000 

OO 

00O62 

0-0000 

0-0U62 

3-1 

4 


6 . Ab in the previous example, fit a normal distribution to the 
distribution of wages of 1,000 employees given in Ex. I, 3, showing 
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that the class frequencies per thousand of the normal distribution 
are approximately 6*7, 11*3, 25*0, 48*0, 79*0, 113*1, 140*5, 151*0, 
140*8^3*5, 79*5, 48*1, 25*3, 11*5 and 6*7. 

In IjOOOextensivesetsoftrialsforaneventof small probability, 
the frequencies /< of the numbers of successes proved to be 


0 1 234667 

/: 305 365 210 80 28 9 2 1 

Show that the mean number of successes is 1*2, and hence that the 
frequencies of the Poissonian distribution, with the same mean and 
the same total frequency, are approximately 301*2, 361*4, 216*8, 
86*7, 26*0, 6*2, 1*2, 0*2. ... Verify that the variance of the given 
distribution is 1*28. 


8. Sum of independent normal variates. Another proof of the 
theorem of § 22 may be given as follows. Let x and y be the in- 
dependent normal variates, with a common mean zero and variances 
(rf and al respectively; and let w = a; -f y. Then, for a fixed value of y, 
du = dx. The probabihty that the value of x will fall in the interval 
dx is <f>(x)dx] and therefore, for a fixed value of y in the interval dy, 
the probability that u will fall in the interval du is 




du r 


(tt-i/n 

2a\ J- 


But the probability of y falling in the interval dy is 


and the compound probability of u falling in the interval du and, 
at the same time, y in the interval dy is the product dp^dp^. In- 
tegrating with respect to y over its range of variation, we have the 
probabihty that u will fall in the interval du, irrespective of the 
value of y, as 


du 


(tt-y)* y® 
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In this expression the argument of the exponential may be put in 
the form 

- ^ y 

2{al + al) 2cr\(xlY erf + cri/ ’ 

If then we change the variable of integration from y to ty where 




U(T\ 


we have 


dp 


o-f + cr; 


.2» 


du 


2n(T^€r^ 

du 


exp 


V[2n-(o-? + <ri)] 


L 2(tr| + tri)JJ_ 
r 

L 2(a? + or*)J 


exp 


exp 


1 


r (q-f + er ptn 
L 2crf<r| J 


dt 


in virtue of (10). Thus u is normally distributed about zero as 
mean, with variance trf + The reader should have no difficulty 
in removing the restriction of a common zero mean for the 
variates. 

The following is still another proof of the theorem. With the 
same notation let 

CTo O’ I 

CTj (Tji 


Then 


-f v* a:* y^ 
2(ot+^) ^■’’2^' 


and, as the values of x and y range from — oo to oo, so do those of 
u and v. The probability that x and y fall simultaneously in the 
respective intervals dx and dy is 


^no’i^cr^ 


exp 



dxdyy 


BO that the probability density for the joint distribution of x and y, 
i.e. the probability per unit area at the point (a;, y), is 
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Now the area of the element of the xy plane, bounded by the curves 
along which u has the values u and u-\-du respectively, and those 
along which v has the values v and v + dv, is 

o(w,v)| o-f + o-| 

3 (x y') 

where is the Jacobian of z and y with respect to u and v, 

d(u, v) 

Consequently the probability that, for a random choice of x and y, 
the representative point {x, y) will faU in this element of area is 


dudVy 


1 r + 

27r((r5 + cr|)^^^[_ 2\a-fH-cr|)J ^ ^ 


du I ^ 

cr^iin) \ 2ay (ryj{2: 


V / \ 


where == crj + 0*3. Since this is the probability that simultaneously 
u will he in the interval du and v in the interval dv, it follows that 
these variates are independent, and that each is normally distributed 
with^ro as mean and (r| P(r| as variance. 

*^^9. Using the formulae of Ex. I, 11 show that, for the binomial 
distribution, the factorial moments about the mean are 

/«<2) = npq, = - 2 npq{ p -f- 1 ), 

and that, for the Poissonian distribution, 

/i^2) = /^(8) = - 2m, = 3m(m + 2) . 

MATHEMATICAL NOTES 
NOTE I 


Derivation of the Poissonian distribution (see §18) 

In the binomial distribution the probabihty of the value r of 

the variate is j We require the hmiting value of this ex- 

pression when p = min and n tends to infinity, m being constant. 
The expression may be written 

nl /^y/i 

r!(n— r)!\n/ \ n/ \ ^ (n — r)!n'’(l — m/n)''* 
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The limiting value of that part which precedes the sign of multi- 
plication is rn/‘e~^lr\. Then, using Stirling’s formula forn!, viz. 

nl~^(2;rn)n^e“* 

(the limiting value of the ratio of these two expressions, as n tends 
to infinity, being unity), we have 



(n — f ) ! n’’( 1 — mlnY 

^ V( 27rn)n^e-^ 

^{2n(n — f )} (n — r)^~^ 1 — tnlny 

1 1 

e'‘(l — (1 — mlnY * 

since r is finite. Consequently 


lim 



m^e" 


fl ^ 


which is the required probability of the value x =■ r in the Poissonian 
distribution with mean m. 


NOTE II 

Derivation of the normal distribution (see §19) 

To examine the behaviour of the probability 

p_ 

(np^x)\{nq — x)\ 

as n tends to infinity, replace the factorials by their values given 
by Stirling’s formula (Note I). Then an easy algebraical simpli- 
fication leads to 

p 1 

N ^{27mpq) ’ 




1 1 H 1 1 

1 1 

\ ^P/ \ 

nqj 


whore 


N 
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Taking logarithms of both sides we may write, provided | x | is less 
than the smaller of the two quantities np and Tiq, 

logJf . + + 

2n\p q)^ 2npq 4/1^ \p^ ^ qy ^ 6n^ py 

4- terms with higher powers of 1 jn. 

Introducing a new variable, z = x/>^n, we may write the above 


Z [I l\ Z^ 22/1 l\ 22/1 1\ 

2yjn\p q)^2pq ^n\p^^ qy^ ^^n\q^ py^‘“^ 

This series is convergent so long as [ 2 [ is less than the smaller 
of the quantities pyjn and qsjn. Now let n tend to infinity. Then, 
provided that neither p nor q is very small, and that 1 2 j is either 
finite or of lower order than all the terms of the above series tend 
to zero except z^l2pq\ so that, when n tends to infinity', we have 



or 


N = exp (z^l^pq)- 


Corresponding to unit increment in x we have thednerement Ifyjn 
in 2 , which may be denoted by dz when n tends to infinity. And, if 
we write dP for the limiting value of P, the above formula for P 
becomes 

giving the probability of z falling in the interval dz; and we have 
the required continuous distribution of 2 . 


NOTE III 

An important integral 
To evaluate the integral 

/ = J* exp( — x*)dlr, 


WMS 


s 
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/i= ( exp(-x*)(ia: = f exp(-y*)rfy. 
Jo Jo 


Then 


/J= r r exp(-a:*~y*)da:dy. 
Jo J^o 


If we regard x, y as Cartesian coordinates in a plane, the integration 
is extended over the area of the square OACB, bounded by a? = 0, 
a; = a, y == 0, 2 / = a (see Fig. 4). In polar coordinates the above 
relation is 


i\ = 



— T^)rdrddy 


the integration being extended over the same square Since the 
integrand is positive, the integral is intermediate in value between 
the integrals over the quadrants of circles, with centre 0 and radii 
a and a ^2 respectively. Consequently 1\ lies in value between 

ra r\n ra^/2 

I dO \ fexp( — r*)dr, and d0\ rexp( — r*)dr. 

Jo Jo Jo Jo 

But, as a tends to infinity, each of these integrals converges to Jtt. 
Hence 




CHAPTER IV 


BIVARIATE DISTRIBUTIONS. 
REGRESSION AND CORRELATION 

23. Discrete distributions. Moments 

Suppose that we have records for N marriages, giving the ages of 
bridegroom and bride, z and y years respectively. Then to each 
marriage corresponds a pair of values {x^y y^) of the variables. Each 
pair may be represented by a point in the x^y plane,* the co- 
ordinates of the point being the pair of values represented. Such a 
graphical representation is called a scatter diagram. Possibly the 
pairs of values are not all different. If the pair (x^, y^) occurs/^- times, 
then fi is the frequency of that pair; and, as in the case of a single 
variable, 

( 1 ) 

i - 1 

n being the number of different pairs of values. The assemblage of 
pairs of values, together with their frequencies, constitutes a 
bivariate frequency distribution. Other examples of bivariate dis- 
tributions are furnished by the heights and the weights of a group 
of men, or by the amount of fertilizer per acre and the yield of grain 
per acre on a number of different plots of land. 

The moments of a bivariate distribution are generalizations of 
those of a univariate one. Thus the moment fi'^ about the origin 
(0, 0), of order r in x and s in y, is defined by 

= ( 2 ) 

In particular j ^ 

/‘w = ^ == X, n'n = (3) 

where z is the mean value of x in the distribution, and y the mean 
value of y. Similarly, 

/tit- W 
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where cr* is the variance of the variable x in the distribution, or the 
moment //go about the mean {x,y)\ and likewise crj is the variance 
of y, or the moment /^o 2 about the mean. Lastly, we may mention 
the moment which is given by 

The corresponding moment about the mean of the distribution 
is called the covariance* of the variables. Thus 

All = ^'Zfi{Xi-x){yt-y) 

= Ki-x'Lfi yJN -y'Lfi xJN + xy 
= fi'u-xy (5) 

in virtue of (1) and (3). This formula is very important, and Vv ill be 
used frequently. 

24. Continuous distributions 

A continuous bivariate distribution includes all pairs of values 
represented by points within a certain region of the xy plane, each 
pair of values occurring at least once. The number of pairs of values 
is therefore infinite. Distributions of the type ordinarily employed 
have each for relative frequency density a continuous function J{x, y), 
which is such that the relative frequency for the infinitesimal 
rectangular region, of area dxdy, and defined by 

x-idx^x^x-h^dz, y-\dyt,y^y^-\dy, 

has the vsf\xef(x,y)dxdy. Thus f(x,y) is the relative frequency per 
unit area at the point (x, y), and the sum of the relative frequencies 
for all element;? of the distribution is unity, so that 

///(» ,y)dxdy^l, (6) 

• Many writers denote this quantity by p. We are, however, loth to over- 
work the 83rmbol p by giving it still another meaning. 
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the integration extending over the region* of the plane which repre- 
sents the distribution. The mean y)o{ the distribution is given by 


JJ xf(x, y) dxdy, V = JJ v) dxdy. (7) 


For higher moments we have formulae corresponding to those of 
a discrete distribution. Thus 


f^io = jj{x-xff{x,y)dxdy 
= - 2xjjxf{x, y) dxdy + 


which is equivalent to 
and similarly, 


O'! =i“20-^. 


( 8 ) 

( 9 ) 


The covariance, being the first product moment about the mean, is 


An = JJ ~ 2/) dxdy 

= An - ^ JJ vK^y y) dxdy - y jj xf(x, y) dxdy + xy 


= lin-xy. 


( 10 ) 


in virtue of (6) and (7). 

It will be sufficient, as a rule, to give proofs of theorems for a 
discrete distribution. The student can easily rewrite these for the 
case of a continuous distribution, replacing sums by definite 
integrals, and the relative frequency fJN by f(x,y)dxdy. Some of 
the steps will be indicated in Examples. 


25. Lines of regression 

It frequently happens that the scatter diagram indicates an 
association between the variables, x and y, the distribution of dots 
being denser in the neighbourhood of a certain curve, which may 
be called a curve of regression. The equation of such a curve indicates 
a functional relationship to which the association of the variables 

* By defining/(a:, y) as equal to zero outside this region, we may make the 
range of integration — oo to oo for each variable. 
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approximates, more or less roughly. We shall consider later the 
problem of fitting difierent curves to the data, so as to obtain the 
curve of a specified form which best fits the data. For the present 
we confine our attention to the straight line which is the best fit, or 
more definitely, to the problem of determining two straight lines, 
one of which gives the closest estimate a straight line can give to 
the average value of y for each specified value of x, while the other 
gives the corresponding estimate of x for a given value of y. 
These are called the lines of regression of y on x, and of a; on y 
respectively. 

Consider first the line of regression of y on x. We have to deter- 
mine constants a and 6 so that the equation 

y = a + 6a: (11) 

gives, for each value of x, the best estimate a linear equation can 
give for the average value of y. We interpret the term ‘ best estimate * 
in accordance with the principle of least squares; that is to say, we 
find a and 6 so as to minimize the sum of the squares of the deviations 
of the actual values of y in the distribution from their estimates 
given by ( 1 1 ). Thus if is the point of the diagram representing the 
pair of values {x^tyi)» is the point on the straight line (11) 



with the same abscissa x^, the deviation of from its estimate is 
HiPo whose value is clearly — (a + 6x^). Thus we have to choose 
a and 6 so aa to make ^ minimum, /< being the 

i 

froquency of the pair of values y^). For a minimum value the 
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partial derivatives of the above expression with respect to a and b 
must both be zero. Hence the equations for determining these 
constants are 


= 0, S/ia:i(j/i-o-6x<) = O; (12) 

which are called the normal equations. In virtue of (3), (4) and (6) * 
they are equivalent to 

= 0 (13) 

and ax — 6(cr| + S^) = 0. (14) 

The first of these shows that the required line passes through the. 
mean (x,y) of the distribution. Also, on eliminating a between 
(13) and (14), we find 


Consequently the gradient b of the line of regression of y on x is 


6 = /‘ii/o'l 


(16) 


and, since the line passes through (x, y ), its equation may be expressed 


y-y-^{x-x). (16) 

If the mean of the distribution is taken as origin, the line of regres- 
sion of V on X is 

y = /‘ii-'c/o'j- (17) 

The gradient fiiil^x *8 often called the coefficient of regression of 

y on X. 

Similarly, or by interchanging variables, we find that the line of 
regression of x on y is 

x-x = ^{y-y), (18) 


and /iiJ&l is the coefficient of regression of x on y. The product of 
the two coefficients of regression is symmetrical with respect to x 
and y. Its square root, /^ii/a'^.cTy, is the coejfcient of correlation^ r. 
Thus 


having the same sign as the covariance 


( 19 ) 
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Example 1. Find the tangent of the inclination, /?, of (18) to (16). 
Since the gradients of the two lines are o%l fin and we have 




_ l~r* (T^CTy 

Example 2. For a continuous bivariate distribution the mean square 
deviation from the lino of regression oiy on x is 


//' 


{y — a — hx)^ f (Xy y) dxdy. 

By minimizing this for variation of a and 6, show that the normal equations are 
JJ (y-a-bx)j{xyy)dxdy = 0 = JJ x{y~a-bx)f{xyy)dxdy, 
and that these are expressible in the forms ( 1 3) and (14). 


26. Coefficient of correlation. Standard error of estimate 

The significance of the correlation coefficient, r, as a measure of 
the closeness of the association of the variables x and ?/, will be 
apparent from the theorem now to be considered. The equation of 
the line of regression of y on a; was found by minimizing the sum of 
the squares of the deviations We shall now prove that the 

sum of the squares of these deviations from the line of regression of 
on a; is equal to NaK 1 — r^). 

Let the mean of the distribution be taken as origin, so that 
^ = y = 0. Then, in virtue of (17), is equal to y^ — bx^] and the 
required sum of squares of the deviations is 

'LUVi - bXiY = E/< - 26 ILSiXiVi + 6* 'LfiA 

^Ntrl-iNb/iyi+Nb^al 



= 2V(r*(l-r*), (20) 

as stated above. Denoting this sum of squares by we have 

,^ = cr*(l-r*) (21) 
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Since /SJ is the mean square deviation of points from the line of 
regression of y on x, Sy is called the standard error of estimate of y 
from the regression equation (16). In the same way the sum of the 
squares of the deviations of points from the line of regression of x 
on ?/, measured parallel to the x-axis, is N where 

= (23) 

and is the standard error of estimate of x from the regression 
equation (18). 

Since the sum of squares of deviation cannot be negative, it follows 
from (20) that < 1, or 

(24) 

Tf r = 1 or — 1, the sum of squares of deviations from either line of 
regression is zero. Consequently each deviation is zero, and all the 
]>oints lie on both lines of regression. These two lines then coincide, 
and there is a linear funetioiuil relation betAveen the variables x 
and ?/, giving perfeet (‘orretvt iop The nearer r‘^ is to unity, the closer 
are the points to the lines of regT-r-ssion, and the nearer are these two 
lines to coincidence (of. § 25, Kx, 1 ). Thus the rnagnitude of r may be 
taken as a measure of the degree to vdiich the association between the 
variables approaches a linear f unctional relationship , The sign of r 
is the same as tliat of the covariance and therefore also the same 
as that of the gradients of the lines of regression. Hence r is positive 
w^hen, on the whole, y increases with x, and negative when y decreases 
as X increases. When r is zero the variables are usually described as 
uncorrelated. 

The coefficient of correlation between two variables is also the 
coefficient of correlation between the deviations of the variables 
from their means. For the value of r depends only on and cr^, 

and these are functions of the above deviations. Thus 

_ ^ (^- 7 y) 

and, in virtue of (3), (4) and (5), this may be expressed 

_ ^ I.fUt yi- (S/f 

This fonnula is sometimes convenient. 
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Example, Taking the mean as origin for a continuous distribution, show 
that the mean square deviation, frbm the line of regression of y on a; 
is given by 

(y-hx)^f(x,y)dxdy 
= (tJ - + 6 = or;( 1 - r*). 

27. Estimates from the regression equation 

A few simple relations between the and their estimates, 
given by the regression equation (11) or (16), play an important 
part in later work. Thus 

7^ = a + 6a:<, (26) 

and the normal equations (12) are expressible as 

T.Uyi-Y,) = 0 (27) 

and 'Lfi^iiyi-Yi) = 0. (28) 

From (27) it follows that the mean of the F’s is equal to the mean 
y of the y’s. Also, on multiplying (27 ) and (28) by a and b respectively 
and adding, we deduce 

ZfiY,(y,-Y,) = 0, (29) 

and therefore, in virtue of (27), 

2/,(2/,-r,)(r,-y) = o. (30) 

This relation is very important. 

The sum of the squares of the deviations of the y*8 from their 
mean may then be expressed 

= I.fi[(yt-Yi) + {Y,-y)f 

-‘I.Uyi-Ytr + I.fi(Yi-y)^ (31) 

since the sum of the products vanishes by (30). Now the first 
member of (31) has the value NerJ. The first sum in the second 
member has been proved equal to Nal( \ — r^). Consequently 

I.fi(Yi-yr ^ Nr^al. (32) 

showing that the variance of the F’s is r* times that of the y’e; or 

(r\r = rV®, o-y = I r I (33) 
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Finally, we may show that the coefficient of correlation between 
the Iff and their estimates Yf is equal to | r | . Take the origin at the 
common mean of these variables, so that y = 0. Then, in virtue 
of (29), 

Y.fiyiYi^'LfiY\ = N(T\, 


and the coefficient of correlation between the y's and the F’s is 


'LSiViYil N ^ cr-y ^ ^ ^ . I 


(34) 


Example 1. 
origin, so that 
coefficiont 


Prove (34) as follows. Taking the mean of the distribution as 
Yi = bxiy (Ty = I 6 |<r*, we have for tho required correlation 

AV^o-y N I 6 I o-^or^ “ I 6 [ ~ 


since r and h have tlie same sign. 

Example 2. Show that the normal equations for a continuous distribution 
(cf. § 25, Ex. 2) are expressible as 


// 


(y-Y)f(x,y)dxdy 


= 0 , 



Ay - Y)J(x, y) dxdy = 0, 


and from these deduce the relations 



Y(y- Y)f{x,y)dxdy = 


0 , 



(i/- Y)(Y -y)f(x,y)dxdy = 0 


corresponding to (29) and (30). Hence show that 

o-J = JJ(y- Y)^f(x,y)dxdy-\-^ ^ (Y -y)^J(x,y)dxdy, 

and deduce that J J(Y — y)^f(Xt y) dxdy = r*crj. 


28. Change of units 

Before illustrating the above theory by a numerical example, let 
us examine the effect of a change of units on the calculation of r, 
{III and 6. As in § 2, let u be the measure of the deviation of x from an 
origin a: = a, in terms of a unit c times the original x-unit, so that 
X = a -f cu. Then, if x' and u' are the deviations of the two variables 
from their means, x' =» cu\ and 

0%=^^ ^ = c*crl 
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Similarly, if r is the measure of the deviation of y from an origin 
y == a', in terms of a unit & times the original y-unit, y = a' + c'i;, 
y' = c'v' and cTy = cV^. Then the coefficient of correlation between 
X and y is 

^ ^ __ cc^^fiU'iV^jN ^ covariance of u and v 

(T^dy ““ ' 

and is thus equal to the correlation between u and v. The value of r 
is therefore independent of the units emploj’ed, and is thus an 
absolute measure of correlation. But the covariance of x and y is 

/^ii = = cc'YsfiU'iv'JN 

= cc' (covariance of u and v). (35) 

Similarly, the regression coefficient of y on a; is 

// 

6 = - “ = — (regression coefficient of v on u). (36) 

d^ c 

If then c and c' are equal, the value 6 is the same for both pairs of 
variables. 

29. Numerical illustration 

For 1,000 marriages the ages of bridegroom and bride, x and y years 
respectively, are grouped in the table below with class interval of 5 years for 
each, the frequencies for the different classes being shown in the body of the 
table. Find the regression equations and the coefficient of correlation between 
the variables. 

Such a table is called a correlation table. The values of x and y indicated 
are the mid -values in the classes. Thus for the class in which the age of the 
bridegroom is between 25 and 30, and the age of the bride between 20 and 25, 
the values of x and y are taken as 27*5 and 22*5 respectively, and the frequency 
is 190. The data represented in any one column constitute a vertical array, 
or an array of y’s, because in each such array y assumes different values 
while X remains constant. Similarly, the data in any row constitute a hori- 
zontal array, or an array of a;*s. In the row prefixed the frequencies 
are given for the individual columns, and in the column headed the 
frequencies for the separate rows. 

Let us take as new origin the point (27'6, 27-5), whose coordinates are 
the mid-values of the class 25 to 30 for x and y; and, as a new unit 
for each variable, the common class interval of 5 years. Then the deviations, 
u and V, of the ages of bridegroom and bride from the new origin in terms of 
the new unit are those shown in the second row and the second column. The 
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a> 

o 
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fourth TOW from the bottom of the table, prefixed uN,, gives the sum S/u 
for each column; and these are added horizontally to give the total sum 124 
for the distribution. The next row gives the sum Z /u^ for each column ,with 
total sum 1,834 for the distribution. The row prefixed V gives for each 
column the sum Hfv. Thus the sum — 28, in the column w = — 2, is obtained 
from 

ll(-2) + 6(-l) = ~28. 

The last row gives the sum 2 fuv for each colunm, each entry being obtained 
by multiplying the value of V above by the common value of u for that 
column. Summing horizontally all the values uV, we have the product sum 
S/uv for the whole distribution, viz. 1,109. 

The columns to the right of the table are explained similarly. That headed 
U gives for each row the sum 2 fu. Thus the first entry — 79 in the column is 
obtained from 

ll(~2) + 62(-l) + 19x0 + 3x 1 + 1 x2 = - 79. 

The last colunm, headed vt/, gives the sum 2/wv for each row, any entry 
being obtained by multiplying the value of U to the left by the common value 
of V for that row. Summing the entries in this column we find again the 
product sum 2/uv for the whole distribution, thus providing a check on the 
calculations. 

Using the values thus obtained we have 

u = 124/1000 = 0-124, V = -371/1000 = -0*371. 

Therefore 

X = 27-6 + 5(0-124) = 28-120, y = 27-5-6(0-371) = 25-645, 
giving the mean ages of bridegroom and bride. The variance of u is 
al = l-834-(0-124)a = 1-8186, 
so that <r„ = 1-349 = 1*36 nearly, 

and therefore o’* = 6*745 = 6-75 nearly. 

Similarly, we find (tJ = 1-631 -(0-371)* = 1-3934, 

O', = 1*18; O', = 6-90* 

The coefficient of correlation between x and y is equal to that between u 
and V, Thus 

_ ZuvjN^uv _ 1-109 + 0-371 x 0-124 
’’ r 349 X 1 180 

= 0*726 = 0-73 nearly. 

Since the units for u cmd v are equal, the regression coefficient of y on x is 
equal to that of v on u, which is 

(covariance of u and v)/ai » 1 * 155 / 1*8186 = 0 * 635 . 
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The line of regression of y on a; is therefore 

y- 26*646 = 0*635(a;-28*12), 
that is y = 7*79 + 0*636a;. 

The student may find similarly that the line of regression of a; on y is 

X = 6*86-f 0*829y, 


30. Correlation of ranks 

A group of n individuals may be arranged in order of merit or 
proficiency in the possession of a certain characteristic. The same 
group would, as a rule, give different orders for different character- 
istics. Considering the orders corresponding to two characteristics, 
A and J5, let be the ranks of the tth individual in A and B 
respectively. Then the coeflicient of correlation between the x*s and 
the y’s is called the rank correlation coefficient in the characteristics 
A and B for that group of individuals. On the assumption that no 
two individuals are bracketed equal in either classification, each of 
the variables takes the values 1, 2, 3, ..., n; and therefore 

x = i(n+l) = jf. (37) 

As a rule is not equal to y^. Let denote the difference, so that 

= (38) 

Then, if x’ and y* denote the deviations of the variables from their 
means, we have also 

dt’=‘x'i-y\. (39) 

The coefficient of correlation between the variables is given by 

r - U0\ 

To express this in terms of n and the differences, we observe that 
the variance of each of the variables x and y is — 1)/12 (cf. §3, 
Ex. 2). Therefore 

2^;" = Ey? = (r^*-n)/12. (41) 

Also ^d\ = 'ZiA-y'i? = 2*;*+ Sy;*- 22x;yi. 
and thus, in virtue of (41), 
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Substitution of these values in (40) gives 


r 




[rv 

(42) 


This is the required formula for the coefficient of correlation of 
ranks. 

If the correlation is perfect all the d’s are zero, and r = 1. If the 
orders in the two characteristics are exactly the reverse of each 
other, + n + 1. All the points of the scatter diagram then he 
on a straight hne with negative gradient. Consequently r « — 1, 
and there is perfect inverse correlation. 

Example, The ranks of the same 16 students in Mathematics and Physics 
were as follows, two numbers within brackets denoting the ranks of the same 
student in Mathematics and Physics respectively: (I, 1), (2, 10), (3, 3), (4, 4), 
(6,6), (6,7), (7,2), (8.6), (9,8), (10,11), (11,15), (12,9), (13,14), (14,12), 
(15, 16), (16, 13). Calculate the rank correlation coeOk^icmt for proficiencies 
of this group in Mathematics and Physics. 

Here is the difference between the two numbers in the ith pair of brackets. 
It is easily veritiod that ScT, = 136, n*-~n 16 x 255. Consequently 


6x 136 
16 ^ 2 ^ 


l~i = 0-8. 


31. Bivariate probability distributions 

The theorems proved for a bivariate frequency distribution in 
§§ 23-28 hold equally for a probability distribution of two variates, 
relative frequency in the former case being replaced by probability 
in the latter. Thus, for a discrete distribution, if is the prob- 
ability of the occurrence of the pair of values the moment 

about the origin is 

= 'LPiAVi, 

i 

which is the expected value of the product xy. In particular the 
expected values of x and y are 

E{x) - E{y) = 

while, corresponding to (4) and (5), we have 

o % - /tio- - Kt-imv 

A»u “ Ki-~E(x)E{y), 


and 
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being the covariance of the variates, which is the expected value 
of the product of their deviations from their means. The coeflScient 
of correlation is de lined by (19), which may here be expressed in the 
alternative form 


_ E{xY) ^ Ejxy) 


(19') 


x', y' being the deviations of the variates from their expected values. 

In the case of continuous probability distributions we confine 
our attention to those in which the probability that the variates 
will fall simultaneously in the intervals dx and dy is expressible in 
the form <f>{x,y)dxdy, the probability density (f>{Xyy) being con- 
tinuous and essentially positive. Then we have formulae corre- 
sponding to (6)-(10), with <f>{x,y) in place of f(x,y). 

The variates are independent if the probability distribution of 
each is independent of the value assumed by the other. In particular, 
if the variates are continuous, with probability densities ^i(x) and 
02 ( 2 /) respectively, the probability that x will fall in the interval dxy 
and at the same time y will fall in the interval dy, is (j>i{x) dx(l>^{y)dy 
by the theorem of compound probabihty. Then the probability 
density 0(a:, y) for the bivariate distribution is of the form 0i(x) 02 (y)- 
Conversely, when this relation holds, the continuous variates are 
independent. Now it was proved in §§ 10 and 12 that the covariance 
of two independent variates is equal to zero. It follows from the 
above definition of r that Uie coefficient of correlation of two independent, 
variates is equal to zero. The converse of this theorem, however, is 
not necessarily true; that is to say, uncorrelated variables are not, 
necessarily independent. ’ 

The moment generating function of the bivariate distribution is 
defined as the expected value of the function exp (tiX-^t^y), where 
and ^2 ^re independent of x and y. Thus, in the case of a con- 
tinuous distribution, the m.g.f. with respect to the origin is 


= JJ exp(tix + t2y)<f>{x,y)dxdy. 

When this integral has a meaning the exponential may be expanded 
in powers of and ^ 2 ^ ^*^d the coefficient of t\t%lr\s \ is the moment 
about the origin. In Ex. 3 at the end of Chapter v we shall find 


W MS 


6 
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this function for the bivariate normal distribution, and use it to 
calcuJate the moments. 

When the variates x and y are independent we have 

= j&[exp(tix+f,y)] = ^[expCliX)]. jB[exp(f,y)] 

= (function of x (function of 

And conversely, when the m.gX is of this form, the variates are 
independent. 


33. Variance of a sum of variates 

Consider the variates x and y, with standard deviations <r^ and 
respectively. Their distributions determine a bivariate prob- 
ability distribution for the two variates, with correlation coefficient 
r given by (19'); and each value that may be taken by their sum 


u = x+y (43) 

has a definite probability. Then, in virtue of § 10 (10), 


i?(ii) = £(x) + .B(y), (44) 

and therefore, if x', y', u' are the deviations of the variates firom 
their means, we obtain from (43) and (44) by subtraction 


tt' = x'+y'. 

Consequently a'* = x'*+y'*+ 2xy, 


and, on taking the expected value of each member we deduce, in 

virtue of (19'), • . . 

crj = oj+or* + 2nr,cry. (45) 


This is the required formula for the variance of the sum of the 
variates. And, since — r is the correlation between x and — y, it 
foUowB that the variance of the difference 


i^ = x— y 

is given 1y aj = aj+crj-2nr,cr^ (46) 

If r is zero, as in the case of independence of x and y, we have the 

simple result . • . . 

oj = a* = ai+a%, (47) 

•bteady fooved in §§ 10, 16 and 16. 
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More generally, let be a linear function of the variates x,y,z, . . . , 

BO that , 

u = aa: + 6y+c2 + ..., (48) 


the constants a, b,c, being either positive or negative. Then 
E{u) = aE(x)-{-bE{y) + cE{z)-\- ... 
and therefore by subtraction 

u' == ax' + by' + cz' 

the primes indicating that the variates are measured from their 
means. On squaring both sides, and taking expected values, we 
obtain the required formula 

crl = + 6 VJ -f c Vf + . . . -h + . . . , (49) 

in which is the coefficient of correlation between x and y. 
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EXAMPLES IV 

1. Show that, if a and b are constants and r is the correlation 
between x and y, then the correlation between ax and by is equal to 
r if the signs of a and 6 are alike, and to — r if they are diflFerent. 

Also show that, if the constants a, 6, c are positive, the correlation 
between (ax -f by) and cy is equal to 

(arcr^ + bo’y)lyl(a^cr%^b^crl 4- 2abrcr^cry). 
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2. The variab les x and y are connected by the equation 
ax 4- 6?/ H- c = 0. b how that the correlation between them is — 1 if 
the signs of a and 6 are alike, and + 1 if they are different. 

3. Show that, if x', y' are the deviations of the variables from 
their means, 

»• = 1 - S {^ilcrx - y>yY 

and ^ ^ ^ 

and deduce that — 1 <r < 1. (Rietz, 1927, 2, p. 84.) 

4. The variates x and y have zero means, the same variance 
and zero correlation. Show that 

(x cos a -f 2/ sin a) and (x sin cc — y cos a) 

have the same variance and zero correlation. 

5. Weighted mean with minimum variance. Let x^ {i = 1, ...,n) 
be n independent variates with variances erf. If the variates are 
given weights their weighted mean is 

i i i 

where 

i 

We can show that the variance of this weighted mean is least when 
the weights are inversely proportional to variances erf. In virtue 
of (49) the variance of the weighted mean is 

ora ^ 2cf(rf = 

i i j 

For a minimum <r® the partial derivatives of this with respect to the 
must be zero. This requires 

80 that =* 

showing that w^crl is the same for all values of i. Thus the weights 
of the variates are inversely proportional to their variances. 

Show that this minimum variance is equal to where H is 
the harmonic mean of the variances (t\. 
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6. For a given bivariate distribution find the straight line for 
which the sum of the squares of the normal deviations is a minimum. 

Let the straight line be x cos a + y sin cl p. Then we have to 
minimize the sum of squares cos a + sin oc-p)^ by equating 

to zero its partial derivatives with respect to p and a. Show, from 
the first of the equations thus obtained, that the required line passes 
through the mean of the distribution. Then, taking the mean as 
origin, show from the second equation that a is given by 


tan 2a = — 


Vhi 


2 * 

V 


Of the two directions at right angles, found from this equation, one 
makes the sum of squares a minimum, and the other a maximum, 
for lines through the mean of the distribution. (L. J. Reed, Metron, 
vol. I, 1921, part 3, pp. 64-61.) 


7. The ranks of the same 15 students in Mathematics and Latin 
were as follows, the two numbers within brackets denoting the ranks 
of the same student: (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), 
(8,1), (9,11), (10,15), (11,9), (12,5), (13,14), (14,12), (15,13). 
Show that the rank correlation coefficient is 0*51. 


8. The marks, x and y, gained by 1,000 students for theory and 
laboratory work respectively, are grouped with common class 


* 

y 

42 

47 

52 

67 

62 

67 

72 

77 

82 

Totals 

62 

3 

9 

19 

4 











35 

67 

9 

26 

37 

25 

6 

— 

— 

— 

— 

103 

62 

10 

38 

74 

45 

19 

0 

— 

— 

— 

102 

67 

4 

20 

69 

96 

64 

23 

7 

— 

— 

263 

72 

— 

4 

i 30 

64 

74 

43 

9 

— 

— 

214 

77 

— 

— 

7 

18 

31 

50 

19 

5 

— 

130 

82 

— 

— 

— 

2 

6 

13 

15 

8 

3 

46 

87 

— 

— 

— 

— 

1 — 

2 

6 

8 

2 

17 

Totals 

20 

97 

226 

244 

189 

137 

65 

21 

5 

1,000 


interval of 6 marks for each variable, the frequencies for the various 
classes being shown in the correlation table above. The values of 
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X and y indicated are the mid- values of the classes. Show that the 
coeflBcient of correlation is 0*68, and the regression equation of 
y on X 

y == 29*7 + 0*656a;. 

9. The logarithm of the m.g.f., < 2 )* cumulative 

function and the cumulant is the coefficient of t\t\lr\s\ 

in the expansion of ^(^i, ^ 2 ) powers of ti and Show that 

^11 = ^21 ~ M ' 2 V ^31 = /^31 ““ 3/^20/^11> ^22 = A22 ““/^20/^02 ““ 

10, By means of the identity 

and the fact that u — 1 ; is constant along each of one set of the 
diagonal lines of a correlation table, the sum of products may 
be calculated without difficulty. Tabulate the quantities u — v, 
(u — v)*, / and f(u — v)* for each diagonal line of the correlation 
table on p. 77, and deduce that S/(w — 1147. Using the 
vcdues found for S 2 A* verify that 2 =» 1109. 

Deduce the same result from the identity 

2 2 /w - 2 /( + t;) « - 2 /2i* - 2 /v» 

using the other set of diagonal lines. 



CHAPTER V 


FURTHER CORRELATION THEORY. 
CURVED REGRESSION LINES 

33. Arrays. Linear regression 

In the numerical illustration of § 29 all the values of x in any one 
vertical array were regarded as equal to the mid-value of x for that 
interval. Since the actual values of x vaiy over a range of 5 years, 
the results obtained cannot be regarded as more than a good approxi- 
mation. A better approximation could be obtained by choosing a 
smaller class interval; but that would make the numerical work 
correspondingly heavier. For accurate work the class interval must 
be so small that all the values of x in a vertical array are either 
exactly or very nearly equal; and similarly all the y’s in a horizontal 
array. The theoretical proof of certain formulae, however, is not 
made any more difficult by a choice of arrays sufficiently liumerons 
to satisfy the above conditions; for our sums are just as ea^ to 
manage whether the number of arrays is large or small. And, if 
we are dealing with a continuous distribution, we may take in- 
finitesimal class intervals, dx and dy, and replace our sums by 
definite integrals. We assume then that, in the tth vertical array, aD 
the x’s have the same value, x^. 

It will be convenient to extend our subscript notation as followB. 
Though the x’s in the f th vertical array all have the same value, the 
y’s are difierent. A typical pair of values in the array is 
represented by the point ; and we shall denote its frequency by 
Thus the first subscript, t, indicates the vertical array, while the 
second, j, indicates the position in that array. Denoting the total 
frequency in the tth array by we have 

^ = X/t/f (1) 

2 indicating summation within that array. The mean value of y 
i 

in this array will be denoted by y^. Thus 

= ZAiSTtf- (*) 
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The mean of the array is represented by the point (?^, whose co- 
ordinates are is the point with abscissa on the line of 

regression of y on x. The mean, G, of the distribution also lies on 
this line of regression, and has coordinates (x, y) given by 

Nx = -Z UfaXi = 2 n^Xi, Ny^-Z. Y^faVa = S n^y^. (3) 

it i i i i 



The summation over the whole distribution is thus divided into 

two summations; for 2 denotes summation for terms in the same 
i 

vertical array, and then 2 indicates summation of the result over 

i 

all the arrays. 

It is worth noting that the same equation is obtained for the line 
of regression of y on x, if each value of y in any array is replaced by 
the mean value of y in that array. For the equation of this regression 
line depends only on x, y, o% and And, since the above change 
does not affect the x’s, x and a% are unaltered by it. So also is y, 
in virtue of the second equation (3). As for we see that 

Nfiu = YYfiiXiyii-Nxy=YniXiyi-N^. 
if i 

by (3), and is therefore unaltered by the above change of y’s. 
Consequently, the regression line of y on x is unaltered. It follows 
that, if the means of the vertical arrays are collinear, their line must 
coincide with the line of regression of y on x. This regression is then 
said to be linear. Similarly, the regression of x on y is said to be 
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linear if the means of the horizontal arrays all lie on the line of 
regression of x on y. It is possible for either regression to be linear, 
or both. 

It was shown in §26 that jS^ = — gives the mean square 

deviation of points from the line of regression of y on x. If this 
regression is linear the means of the vertical arrays lie on this line 
of regression; and, if further the vertical arrays all have the same 
variance, this common variance must be S ^. The s.d. of each array 
is then (Ty^yjil —r^). When the vertical arrays all have the same 
variance, the regression of y on a; is said to be homoscedastic. 

34. Correlation ratios 

Consider next the deviation of the point from the mean 
of the vertical array in which it lies. The sum of the squares of 
these deviations for the whole distribution is denoted by NS'y, 
so that 

NS'y^ = ^y,,{y,^^y,)\ (4) 

Then, by analogy with § 26 (22), the correlation ratio, ^ rjy, of y on x 
is defined by 

( 6 ) 

rjy being regarded as positive. Thus 

Sy =crl{\-r]l), ( 6 ) 

which corresponds to §26(21). From (5) it is clear that 
Also it is easy to show that 17 J > r^. This is evident from a comparison 
of (5) with §26(22), since S'y^Sl, the sum of the squares of the 
deviations in any array being least when they are measured from 
the mean of the array. Thus 

l>Vl>rK (7) 

When the regression of 1/ on x is linear, the straight line of means of 
arrays coincides with the line of regression, and is then equal 
to r*. A non-zero value of 17* ~ r* is thus associated with a departure 
of the regression from linearity. 


♦ Some writers denote this ‘ratio’ by 7,.. 
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It is clear from ( 6 ) that the more nearly 7 * approaches unity, the 
smaUer is SJ*, and therefore the closer are the points to the curve of 
means of the vertical arrays. If 7 * = 1 , then 8^ = 0 , so that all the 
deviations are zero, and all the points lie on the curve of means. 
There is then a functional relation between x and y. We may there- 
fore describe the correlation ratio bs a meeiaurt of the degree to 
which the association between the variables approaches a functional 
relationship expressible in the form y = F{x), where F{x) is a single- 
valued function of x. This should be compared with the corre- 
sponding interpretation of r in § 26. 

A convenient expression for can be found, involving the 
S.D. cr^y of the means of the vertical arrays, each mean being weighted 
with the frequency of that array. This s.d. is given by 

( 8 ) 

t 

Now 

N(t\ = s 

= S Y.fii{yn -»<)*+ S «,(y< - y)\ 

i i i 

the sum of products being zero, since YifijiVij^^Vi) vanishes for each 

array. The first sum on the right is NS^y in virtue of (4), and 
the second is Nal^y. C!onsequently the equation is equivalent to 

(9) 

Thus the variance of the y’s of the distribution is expressible as the 
sum of two parts, of which the first is the variance within the 
arrays, and the second the variance of the weighted means of the 
arrays. Also, comparing (9) with ( 6 ), we see that 

Vi ~ 

and therefore ffy « cr„ ( 10 ) 

The correlation ratio ffy ia therefore the ratio of the s.d. of the 
weighted means of the arrays of y’s to the s of all the y*B of the 
distribution. 
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In the same way, by considering horizontal arrays in each of 
which the value of y is constant, we may define the correlation ratio 
of X on y. For, if Oy is the mean of the jth horizontal array, with 
abscissa and denotes the sum of the squares of the devia- 
tions of the points, each from the mean of the horizontal array in 
which it lies, 7j% is given by 

( 11 ) 

and, as before, we have the relations 

( 12 ) 

and 7* =« o-^/cr*, (13) 

where cr^ is the s.d. of the means of the horizontal arrays, each 
mean being weighted with the frequency riy of that array. 


35, Calculation of correlation ratios 

Formula (10), giving the value of yj, is clearly equivalent to 

= S - y?INa%. (14) 


Now the sum in the numerator may be evaluated without cal- 
culating the deviations — y. For 


t < i 

But 

where T^ is the sum of the j/’s in the ith vertical array. Hence 


2 «<y, = Y.Ti = Ny = T, (16) 

< i 


where T is the sum of the y’s for the whole distribution. We may 
therefore write 


2n,(yt-y)* 

< 


2 — -2-15: 




N' 


and (14) becomes 


( 16 ) 
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Since the deviation from the mean is independent of the choice of 
origin, while numerator and denominator of (14) are altered in the 
same ratio by change of unit, the value of calculated from (16) 
is unaffected by change of origin and unit. 

Example, Calculate the correlation ratios for the distribution of ages of 
bridegroom and bride given in § 29. 

Working in terms of v and the 5-year unit wo have 

T = - 371. iV = 1.000, T^jN = 137-64. 


The values of are given in the row prefixed V, Hence 


S 


TJ 


i 


(-28)2 ^ (_339)a ^ ^ (26)* 

~iT~~ + • • • + 


87(i-81, 


N&l = 1393-4. 


Consequently 




876-81-137-64 739-17 


1393-4 


1393-4 


= 0*5305, 


and rjy = 0-729 nearly. 

This is only slightly greater than the value 0-726 found for r. 

The reader should verify, in the same way, that rjl = 0-5504, so that 
rfg = 0*742. The departure of the regressions from linearity is only slight. 


36. Other relations 

We shall now prove that the mean square deviation of the 
weighted means of the vertical arrays from tlie line of regression 
of 1 / on a; is equal to (tKtjI — r*), that is to say 

S = Naliril-r^), (17) 

i 

where is the estimate of from the regression equation § 25 (16). 
For, subtraction of (6) from § 26 (21) shows that 

i 1 

= S r/ii [{{Uif -yi) + (.Vi-Yi)V- (vu - y<)*l 

= s »<(»/( -r,)* 

i 

the sum of products vanishing, since — is zero for each 

array. Thus (17) has been established. The difference i\\ — r* is zero 
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only when each of the deviations — is zero. In that case the 
means of arrays all lie on the hne of regression, and the regression 
of y on a; is linear. We thus see again that a non-zero value of 
is associated with departure of the regression from linearity. 

We may prove, for use in a later chapter, that the sum of squares 
S '^iWi y)^ resolved into two separate sums. Thus 

i 

2 n,(y, - = 2 nm - + (>; - V)? 

i i 

= (18) 

i i 

the sum of products vanishing, since 

i:n,{y,^Y^){Y,^y) = T^llMya-Y,) {Y,-^y) = 0 

t i j 

in virtue of § 27 (30). The sum in the first member of (18) is Ncr^y, 
or NrfijCr^y by (10). The first sum on the right of (18) has just been 
proved equal to — r^); and the final sum has the value 

Nr^al, in virtue of § 27 (32). The equation (18) thus corresponds to 
the identity 

= Nal(yl-r^)-hNr^orl. (19) 

37. Continuous distributions 

The preceding proofs may be adapted to a continuous distribution 
by an appropriate change of notation, and a replacement of sums by 
definite integrals. The vertical array, whose abscissa is x, we assume 
to bo of infinitesimal breadth dx. Then, if f(x,y) is the relative 
frequency density, the relative frequency for this array is 

J [/(*. y) dx] dy = fi(x) dx, (20) 

where fM) = ^f{x:,y)dy, (21) 

the integration with respect to y extending over the whole of that 
array. The mean y^, of the i/’s in the array is given by 

yJii?d)^^yS{x,y)dy, 


( 22 ) 
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and this is the equation of the curve of means of the vertical arrays. 
The mean of the u'hoie distribution has coordinates 


» = JJ xf{x, y)dxdy = j xf^{x) dx, 
y = y)dxdy=^ j yJiix) dx. 


(23) 


integration with respect to x including all the vertical arrays. The 
above relations correspond to the equations (1), (2) and (3) for a 
discrete distribution. 

The mean square deviation of the values of y, from the means 
of the arrays in which they lie, is given by 


= jj(y-ys)*f(x.y)dxdy. 


(24) 


and i/y is defined as before by an equation of the form (6) or (6). 
The variance of the weighted means of the vertical arrays is 


= J (y* - y)*Mx) dx, (25) 

and, as before, this may be expressed in terms of or* and since 
<ri = jj(y-y)*Ax,y)dxdy 

“ J j*[(y- y*) + (y» - y)?f{x, y) dxdy 

= (26) 

the integral of the product being equal to zero; and from this it 
follows, as in the case of a discrete distribution, that 


Vy ^myl^v 

Formulae corresponding to those of § 36 may be similarly esta- 
blished. For, by subtraction of (24) from the result in the Example 
of §26, 

= <55-5? = JJ[(y- Y)*-{y-%)^]f{x,y)dxdy 

- JJ [{(y -y*) + (y*- ^))* - (y - y*)*]/(^> y) 

•= J [(y»- Y)*f(x,y)dxdy = |*(y,- T)*fi{x)dx, 
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and this is the mean square deviation of the weighted means of 
arrays from the line of regression of y on x. And, corresponding to 
(18), we have the resolution 

= ('[(y.- Y)+{Y-y)YMx)dx 

= J(y*- Y)*Ux)dx+j{Y-m,{^)dx. 

the various integrals corresponding to terms of the identity 

- r*) + 


38. Bivariate normal distribution 

We shall now consider briefly the continuous bivariate distribu- 
tion, which is a generalization of the normal distribution discussed 
in Chapter m. It may be introduced simply as follows.* Assume 
first that the variable x is normally distributed with s.n. <7^. Then, 
if the variable is measured from its mean, the probability that a 
random value of x will fall in the interval dx is 

Assume next that the regression of y on x is linear and homo- 
scedastic. Then, if (Tj is the s.d. of y in the distribution, the common 
variance of the arrays of y’s is cr|(l —/>*), where p is thef coeflScient 
of correlation between the variables. Finally, assume that each 
array of y’s is normally distributed. Then, since the mean of each 
array is on the line of regression 

y = pxerja^, (27) 

and the variance of each array is as stated, the probability that a 
value of y, taken at random in an assigned vertical array, will fall 
in the interval dy is 

• C?f. Rietz, 1927, 2, pp. 104-7. 

t In the theory of sampling, r refers to the sample, and cr^, p to the 
population from which the sample is drawn. 
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By the theorem of compound probability, the chance of a pair of 
values (x, y) falling in the elementary rectangle dxdy is 


dP^dP2 = 


dxdy 




1 (x^ y^\~] 


The probability density (j>{x^y) for the distribution is therefore 




1 " I (x^ ^pxy y^\ 

277(riCrj7(l-p2)®^Pl_ 2{l-p2)\£rf orl]_’ 


( 28 ) 



Such a distribution is called a bivariate normal distribution, and the 
variables are said to be normally correlated. The surface z = <}>(x,y) 
is the normal correlation surface. Since ( 28 ) is of the same form in x 
as in y, we may conclude that the regression of x on i/ is also linear, 
the variance of each array of x’s being <r\{\--p^). The values of x 
in each such array are normally distributed, with mean on the line 
of regression of x on y, whose equation is 




( 29 ) 
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The m.g.f. for the distribution is found below, ♦ and from it the first 
few moments are calculated. 

The nature of the normal correlation surface is indicated in the 
diagram. The curves along which the probability density (or the 
relative frequency density) is constant are the homothetic ellipses 


2pxy ^ , 

0*1 0*2 


(30) 


With respect to these the line of regression of y on x is conjugate to 
the y-axis. For the locus of the mid-points of chords x = const, is 
the straight line (27). Similarly the line of regression of x on y is 
conjugate to the x-axis.f 


39. Intraclass correlation 

Let us now consider the correlation between the measures of 
some common characteristic for pairs of members of the same family 
or class. For example, wo may be interested in the correlation 
between the weights of brothers, or the heights of sisters. The 
relation between two members of the same family is a reciprocal 
one; for, if P belongs to the same family as Q, then Q belongs to the 
same family as P. Each pair of members, P and Q, will therefore 
contribute two entries to the correlation table. In one of them x 
will be the measure of the characteristic for P, and y the measure 
for Q ; in the other, x will be the measure for Q, and y for P. The table 
will thus be symmetrical. 

We shall consider only the case in which each of the h families 
has the same number, k, of members. Then there are k(k-- 1) pairs 
of values for each family, and 

N = hk{k-l) (31) 

gives the total number of pairs of values in the table. Let denote 
the measure of the characteristic for the ^'th member of the tth 
family. Thus the first subscript indicates the family, and the 
second the particular member of that family. Consequently % takes 
the values 1, 2, ..., h, andj the values 1, 2, ..., k. The x’s and the y*8 

• See Ex. V, 3. 

t For further properties of this distribution see Exx. 4, 6 and 6 at the 
end of this chapter. 
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of the correlation table are the hk values x^f in different orders. Any 
one value x^p occurring as an x in the table, will have as its y each 
of the other k—l values for the same family. Thus each value x^f 
occurs k— 1 times as an z; and the mean x for the bivariate dis- 
tribution is therefore 


(ib-i)SS*« 
*=— i-i_ 
hk(k-l) 




(32) 


and, since the table is symmetrical, y has the same value. The 
variance cr* of the x’s is the same as the variance of the hk values 
since each of these occurs the same number of times in the dis« 
tribution. Thus 




(33) 


and orj has the same value, in virtue of S3nnmetry. We may denote 
this common variance by cr*, so that 


oi = <r*. 


(34) 


The coefficient of intrdclass correlation, r, is given by the usual 
formula, which in this case is equivalent to 


i i I 

{j, I = 1,2, k'y j 4= If ^ = 1> 2, . . A). 

The sum is a triple one; for the product (x^^ — x) (a:^— x) is the pro- 
duct of the deviations from the mean for the jth and Jth members 
of the tth family, and the sum must include all such products for 
different values of^’ and I, and for all the families. We carry out first 
the summation with respect to /, observing that I takes all integral 
values from 1 to ifc except the value j. Thus the sum of the terms 
includes all values for the tth family except and may therefore 
be written Icx^ — where x* is the mean for the tth family. The sum 
of the terms x for the k—l values of / is (A — 1 ) x. The triple sum in 
(35) may therefore be written as the double sum 


S [AXy — Xyy — (A — 1 ) x] 

- iES(%-»)(*<-*)-SS(*«-*)*- 

i i i 1 
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Now by (33) the last sum on the right is equal to hkcr^. To evaluate 
the other we first carry out the summation with respect to Thus 


(kx^-kx) 

«= — == k^ha}nf 

i 

where is the variance of the means of the families. 
Substituting these values in (35) we obtain 

hk{k— 1 )(tV = hk^a^ — hkaK 


The coefficient of intraclass correlation is therefore given by the 
formula 


(ic— l)cr2' 


(36) 


Curved Regression Lines 

40. Polynomial regression. Normal equations 

We have seen that the line of regression of y on x gives the best 
representation of the behaviour of y with change of x, that can be 
given by a straight line, the term ‘best* indicating that the sum of 
the squares of the deviations (or ‘residuals*) is less for this straight 
line than for any other. But it is often apparent from the data that 
the regression of y on a: is far from linear; and it may be desirable to 
find an equation of regression which affords a better representation 
of the behaviour of y than can be given by a straight line. The 
simplest type of non-linear regression equation is that in which 
one of the variables is expressed as a polynomial in the other. 
Polynomial regression of y on a; is therefore represented by an equa- 
tion of the form 

y = 60 + 61 x 4 - 62 x 2 + (37) 

in which the coefficients 6 , are constants. Our problem is to deter- 
mine these constants so that the sum of the squares of the residuals 
is a minimum. The choice of the degree, k^ of the polynomial is at 
our disposal. If there are n different pairs of values in the distribu- 
tion, we can make the curve pass through all the representative 
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points by choosing i « n — 1. But, to keep the arithraetieal work 
reasonably simple, k must be fairly small, say 2, 3, or 4. The distribu- 
tion of points in the scatter diagram will frequently suggest the 
shape of the curve of regression, and thus a suitable value for fc. 
Having decided on the value of Ic, we determine the coefficients 
6, by the method of least squares. 

With the notation of § 23 suppose there are n different pairs of 
values, f^ being the frequency of the pair t/^), and the total 
frequency N being given by 

(38) 

j 


Let Hi be the point on the curve (37) with abscissa x^. Then its 
ordinate has the value 

Yi = bo+byXi-\r ...+bi,x!l = 'Z, (39) 

a- 0 


and the deviation of the point from the curve of regression is 

(40) 

The sum of the squares of the deviations is 


i 


(41) 


We have to choose the coefficients so that this sum is a minimum; 
and this is done by equating to zero the partial derivatives of 
with respect to these coefficients. We thus obtain the in- 1 normal 
equations 


= 0 
i 

(t = 1,2 , a = 0,1,2,..., A:). 


(42) 


Written separately, and in terms of the values of the given dis- 
tribution, these are 


l.fiiyi-bo-biXt- ...-bkX^) = 0 , 

i 

'LfiXiiVi-bff-b^Xi- ...-b^xl) = 0, 

i 


mi^iiVi-bn-b^Xi- ...-byx'i) » 0. 


(42') 
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The coefficients b, are determined from these Ic + l equations, which 
involve the sums of powers of x^ from to and sums of 

t 

products from 2 to 2 • For the particular tase k ^ I these 

equations are the same as § 25 (12). 

When the values correspond to equal increments, A, and the 
distribution of the frequencies is symmetrical about Xy the 
equations (42') may be much simplified. Suppose first that n is odd 
and equal to 2m + 1. Then, by taking the origin of x at the middle 
value of that variable, and the common increment h as unit of 
measurement, we have for the values of x in the distribution 

— m, — (m— 1 ), — 1 , 0 , 1 , (m— 1 ), m. 

Hence, owing to the symmetry of the distribution of /* 8 , the sums 
of the odd powers of x are all zero. The sums of the even powers 
may be written down from tables or algebraical formulae. If, 
however, n is even and equal to 2 m, we take the origin of x at the 
mean of the middle pair of values, and \h as the new unit. The 
values of x then become 

-(2m-l), -3, -1, 1, 3, ..., (2m-l), 

and the sums of the odd powers of x vanish as before. Moreover, 
the equations (42') then consist of two groups, one involving 
6 o, 62 * ••• other 6 ^, 63 , .... Their solution is thus simphfied. 

Example. Fit a parabolic curve of regression of y on a; to the seven pairs 
of values 

XI VO 1*5 2 0 2-5 3 0 3-5 4-0 

y: M 1-3 1*6 2*0 2-7 3-4 4 1 

The dot diagram of the seven pairs of values suggests a parabola. Since n 
is odd, and the values of x correspond to equal increments 0-6, we take the 
origin at the middle value 2-5 with 0-6 as the new unit. This is equivalent to 
the transformation 

w = 2aj-6. 

Each frequency /< is unity. The sums of odd powers of u are zero, and the 
calculation may be arranged as in the accoinpeuaying table. The normal 
equations are 

16*2 = 76o + 286„ 14-3 = 2S\y 69-9 = 286o+ 1966,, 

which give immediatelv 

6, « 2*07, 6, cr 0*611. 6, = 0*061. 
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The regression equation is therefore 

y = 207-f O-Sllu + OOOlu* 

= 207 + 0-5ll(2x-6) + 0061(2a?-6)*, 
which simplifies to y = 1*04— 0* 20a: + 0*24 j;*. 


X 

u 

y 

u* 



u*y 

10 

-3 

M 

0 

81 

-3 3 


1-5 

-2 

13 

4 

16 

-2-6 


20 

-1 

16 

1 

1 

-16 


2-5 

0 

2 0 

0 

0 

— 



1 

2-7 

1 

1 

2-7 


3 5 

2 

3 4 

4 

16 

6-8 


40 

3 

41 

9 

81 

12 3 


Totals 

— 

16-2 

28 

196 

14 3 

69*9 


41, Index of correlation 

When considering the line of regression of y on a;, we saw that 
|r I is the coefficient of correlation between the values and their 
estimates found from the regression equation, and that this 
coefficient is connected with the mean square deviation from the 
line of regression by the formula S\ = cr^(l— r^). We propose to 
show that a corresponding relation holds for polynomial regression. 
Multiplying the normal equations (42') by feg, 6^, 6^^ respectively 

and adding we obtain, in virtue of (39), 

2/<i;(y,-r<) = o. (43) 

i 

Consequently (41) is equivalent to 

NS^ = YLfiVAyi-Yi) = (44) 

i i i 

From the first of the normal equations, YLfAVi^^i) == 0, it follows 

i 

that the y*s and the F’s have the same mean. If this is taken as 
origin for both variables, their, variances are given by 

i i 

and the coefficient of correlation, iJ, between the y^ and the by 

Ra„<rr - ^ ^ 'LfJl = a\ 

in virtue of (43), so that 


RVy — OTy. 


(46) 
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Consequently (44) is equivalent to 

= (46) 

which is analogous to § 26 (21). 

The coeflBcient It is thus an indication of the closeness with which 
the points of the scatter diagram approximate to the regression 
curve (37). If i? = 1, then = 0, and all the points P^ lie on the 
curve. For this reason R is often referred to as the index of corre- 
lation for the regression curve (37). With each curve of regression 
there is associated such an index, given by 

(47) 

where is the mean square deviation of the points P^ from the 
curve of regression. In virtue of (43) and (44) NS* in the case 
of polynomial regression is given by 

S i fAViXl. (48) 

Further, the relations (30) and (31) of §27 hold also for poly* 
nomial regression. For, in virtue of (43) and the first of the normal 
equations, we have 

(49) 

i 

Then the sum of the squares of the deviations of the y’s from their 
mean may be expressed 

= sA(yi-r;)*+s/i(y;-y)*, (50) 

in consequence of (49). This equation corresponds to the identity 
Na* « No-lil - R*) + NR*al 
as is evident from (45) and (46). 

Example. Prove, as in § 36, that — i?*) is the mean square deviation 
of the weighted means of the vertical arrays from the curve of regression. 
Also that, if is the frequency in the ith verticeil euray, 

ooriesponding to the identity 
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42. Some related regressions 

A plotting of the values of x against the logarithms of the corre- 
sponding values of y may indicate an approximation to a linear 
relation between x and logy. In such a case we find a regression 
equation of the form 

y = ca», (51) 

where c and a are constants. For this relation is equivalent to 
logy = logc + xloga. 

If then we find the line of regression of logy on x, the constants of 
the equation determine both c and a. 

Similarly, if the plotting of logx against logy indicates that the 
relation between these quantities approximates to linear, we find 
a regression equation of the form 

y = C3^, (52) 

vhich is equivalent to 

logy = logc + 61ogx. 

We have therefore to find the line of regression of logy on logx, and 
the constants of the equation determine c and 6. 

We might also find a regression equation of the form 

y = ca^>, (63) 

where f(x) is a polynomial in x. For this is equivalent to 

log y = log c + (log a)/(x), 

and therefore requires a polynomial regression of logy on x. 

More generally, the principle of least squares may be employed 
to find a regression equation of the form* 

y "1” ^ 2 “^# “i* •••“!" hpJSip^ 

where Aj, ..., Xp are any functions of the independent variable x. 
The argument used in § 40 leads to the normal equations 

= 0 

i i 

* Cf. Fisher, Annala of Eugenics^ vol. rx, part 3, p. 238 (1939). 
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for the determination of the coefficients 6,. Multiplying these 
equations by 6^, respectively and adding, we find 

= 0 , 

so that the sum of squares of the deviations from the regression 
cxirve is 

NS^ = = llfii/iiyi-yi) 

as in § 41 . If one of the quantities is taken as unity (or a constant), 
the mean of the y’s is equal to that of the F’s. The argument of § 41 
then apphes to the present case, leading to (45), (46), (49) and (50). 
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EXAMPLES V 

1 . The heights of brothers, in five families of three each, axe 
as follows: (67, 68, 69), (68 ,68, 71), (68, 70, 72), (70, 70, 73) and 
(71, 72, 73) inches. Show that the mean heights for the families 
are 68, 69, 70, 71, 72 inches, and the general mean x =» 70. Also 
or* = 18/5, or*, = 2, and the coefficient of intraclass correlation of 
heights for the five families is 1/3. 

2. Verify that, for the distribution of Ex. IV, 8, the correlation 
ratios are =■ 0*695 and rjy « 0*685 approximately. 
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3. Momenta and m.g.f, for the bivariate normal distribution. The 
moments may be deduced from the m.g.f. as defined in §31. To 
calculate this function let and be the s.d/s of x and y in the 
distribution, and p the correlation between these variates. Then 
the m.g.f. is given by 




where 


c = 


2;r<ri<rsV(l-p*)‘ 

By transforming the exponential we may write this: 


= c J^^exp {y*-{py+l- 

: - (1 -P*) 


20-1(1 -P*) 


Carrying out the integration with respect to Xy and rearranging the 
exponential, we have 


[J{0^1<! + 2po-i<rj«ita + o-*/|)] 

^ J-® “^1*^**! “ 

= exp[J(<rf/| + 2p<riO-j<i«5j+or|<|)], 


which is the required m.g.f. 

To calculate the various moments we have only to expand this 
function in powers of ti and ig- The expansion is 


1 + + 2p(ria'a<i^a + cr|^|) + J(cr|/f + 2pcri(r2<i<j+cr|/|)72! 4* .... 

Since is the coeflBcient of tltl/rlsl in this expansion we have 

Pa) = o-!. Pii=po-iO-*, Po2 = o^*. P2i = 3po1o-„ 

Pm = (1 + 2p®)trJ(ri, Pi, = SpcTiCTf, p«o = 3o}, Po4 = 3oi, 

and so on. 

4. Show that the area of the ellipse, § 38 (30), of constant prob- 
ability density is nA^cTiOj^il — p*); and hence that the area of the 
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strip between the ellipses corresponding to the parameter values 
A and A + dA is 2nXdX(ri(rJ^{l’-p^). Deduce the probability 

exp [ - A2/2(l AdA/(l 

that a pair of values (x, y) chosen at random, will be represented by 
a point inside this strip; and hence by integration the probability 
that the point will fall inside the ellipse A is 

If this probability is then A^ = 1*3863(1 — p*). 

Also show that, for a given value of dA, the probability that the 
point (x,y) will fall in the strip between the ellipses A and A + dA is 
a maximum when A^ = 1 — This determines the ‘ellipse’ of 
maximum probability’. (Cf. Rietz, 1927, 2, pp. 108-10.) 

5, The variates x and y are normally correlated, and £, ^ are 

defined by ^ ^ 

g = xcos^ + ysm^, 17 = y cos0 — xsm^. 

Show that f , Tj will be uncorrekted if 

tan 20 = 2pcriCr2/(cr5-or|). 

The above transformation corresponds to a rotation of rectangular 
axes of coordinates ; and 6 determines the directions of the principal 
axes of the ellipses of constant density. 

Show that, if y are thus uncorrelated, and (r|, trj are their 
variances, then 


o-jO-, = o-jcr, V( 1 - p% or| + aj = o-f + <r|. 


6 . Show that, if 17 are independent normal variates, and x, y 
are defined by 


X = ^ cos 0 + 7 sin 0, y = 7 cos 0 — ^ sin 0, 


the coefficient of correlation between x and y is given by (cf. Ex. 5) 


r* 


4(r|or* 

(cr| — orp* sin® 20 + 4or|(r} ’ 


which is numerically greatest when 0 == ± Jtt. The extreme values 
of rare ± (<r|-o^)/(cr|+o^). 
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Also show that the points of inflexion of sections of the normal 
correlation surface, by planes through the z-axis, lie on the elliptic 
cylinder f *(rj + = cr| crj. (Cf. Y ule, 1897, 1 .) 

7. The profits, £y, of a certain company in the xth year of its 
life are given by 

1 2 3 4 6 

y: 1250 1400 1650 1950 2300 

Taking u — and v = {y— 1650)150, show that the parabolic 
regression of v on is 

t; + 0*086 « 5*30t^4-0*643i^*, 

and deduce that the parabolic regression of y on a? is 

y = 1 140 + 72* 14x + 32* 1 4a;2. 


8. Let Xy y be normally correlated variates with zero means as 
in §38. Writing 


show that 


^ ^ (y pA 

0^1 ’ ‘^ 1 / 

d{w, z) _ 1 

F{x,y)~ (Ticr^^(l-p’‘) 


and 


10^ + 


1 (x^ 2pxy y^\ 

(TiCTi oy 


Deduce that the joint probability differential of w and z is 


dP = ^ exp [ — \{w^ + z*)] dwdZy 

^7T 


and hence that to, z are independent normal variates, with zero 
means and unit s . d .’ s . In other words w and z are independent 
standard normal variates. 



CHAPTER VI 


THEORY OP SIMPLE SAMPLING 

43. Random sampling from a population 

Lt order to examine a large population with respect to a specified 
characteristic, the statistician chooses a sample of individuals from 
that population and, from the properties of the sample relating to 
the given characteristic, he endeavours to estimate those of the 
population. Suppose that the characteristic considered is the height 
of the individual. Then the assemblage of heights of all the individuals 
in the population is called a population or universe of heights, -and 
those of the individuals in the sample is a sample of heights from that 
population. Similarly, we might consider populations of weights, 
wages, yields of grain, etc. In the same way, if our consideration is 
the percentage of male births in a very large population of births, 
we may be obliged, in estimating this percentage, to confine our 
attention to the data provided by a sample of such births. The theory 
of sampling is concerned, first, with estimating the properties of the 
population from those of the sample, and secondly, with gauging 
the precision of the estimates, i.e. with ascertaining the deviations 
from the true values that may be expected in the estimates obtained. 

Fundamental to the theory is the concept of random sckinpling. 
This is defined by the property that, in the selection of an individual 
from the population, each member of the population has the same 
chance of being chosen.* Statisticians have developed techniques 
for ensuring, as far as possible,- that their sampling is random; but 
the reader who wishes to study the details of these techniques must 
consult other works.f 

Sampling of Attributes 

44. Simple sampling of attributes 

In the sampling of attributes, os distinct from the sampling of 
values of a variable such as height, we are concerned only with the 

* See also Kendall, 1941, 2. 
t £.g. Yule and Kendall, 1937, 1, pp. 336-46. 
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possession or non-possession of some specified attribute or character- 
istic by the individual selected in sampling. For instance, in sampling 
from births we may be concerned only whether the baby is male or 
not. In sampling from a population of men our consideration may 
be whether they are smokers or non-smokers. The choosing of an 
individual in sampling may be called a ‘trial*, and the possession 
of the specified attribute by the individual selected a ‘success*. 
Simple sampling is random sampling with the further provision that 
the probability p of success is the same at each trial. Thus p is a 
constant in the process; and the probability of success at any trial 
is independent of the success or failure of preceding trials. The value 
of p is the relative frequency of the occurrence of the attribute in 
the population from which the sample is drawn. Hence, for the 
sampling to be simple, either the population must be very large, or 
the individual selected must be returned to the population before 
the next trial, success or failure having been noted. 

The problem connected with the drawing of a simple sample of 
n members is thus identical with that of a series of n independent 
trials, with constant probability p of success; and the results of 
§§ 1 1 and 17 are applicable. The probabilities of 0, 1, 2, ... successes 
in a simple sample of n members are thus the terms of the binomial 
expansion of + The binomial probability distribution thus 
determined is called the sampling distribution of the number of 
successes in the sample. The expected value, or mean value, of the 
number of successes is therefore np\ the variance is npq, and the 
standard deviation is ^{npq). This s.d. is usually called the standard 
error (s.e.) of the number of successes in a sample of size n, the 
deviation from the expected value np being looked upon as ‘error*. 
The proportion of successes in a sample is obtained by dividing the 
number of successes by n. The expected value of the proportion of 
successes is therefore p\ and the s.d. of the proportion of successes 
is l/n times that of the number of successes, i.e. 'sl(pqln). Thus 

8.K. of the number of successes = yjinpq), 

s.B. of the proportion of successes = ^J{pQ|n). 

The precision of the proportion of successes observed in the sample 
is regarded as inversely proportional to the s.B. of this proportion. 
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Hence the precision of the observed proportion varies as -^n. In 
particular, to double the precision it is necessary to increase the 
size of the sample four-fold. 

45. Large samples. Test of significance 

As we have just seen, the sampling distributions of the number 
and the proportion of successes in a simple sample of size n are 
binomial distributions. It was also shown in Chapter iii that, for 
large values of n, the binomial distribution approximates to a 
normal distribution, in the sense that the probabilities for corre- 
sponding intervals in the two distributions tend to equality as n 
increases indefinitely. Now we know that the probability that a 
random value of a normal variate, of s.d. cr, will lie outside the 
interval which extends So* on each side of the mean is only 0*0027. 
Similarly, the probability that the value will deviate from the 
mean by more than 2cr is 0*0456, or about 4| %. We may therefore 
conclude that, for large values of n, the probability that the number 
of successes in a simple sample of n members will difier from the 
mean by more than three times the S.E. is also very small; and that 
a deviation of more than twice the s.e. is rather unusual. 

Bearing this in mind we have a test of the credibility of the 
hypothesis that a given large sample, of n members, was obtained 
by simple sampling from a population in which the relative frequency 
of the occurrence of the attribute considered is p. Suppose it is 
found that the number of successes in the sample differs from the 
expected value np by more than 3^(npq), Then an event has hap- 
pened which, on the hypothesis of simple sampling, is very improb- 
able. We conclude then that the truth of this hypothesis is itself 
very improbable, and we say that the difference is higlily significant. 
Considerations of the aspects of the problem will then lead us to 
suspect either that the value of p employed is incorrect, or else that 
the conditions of simple sampling were not observed. A deviation 
from the mean less than twice the s.e. is regarded as not significant. 
For deviations greater than twice the s.e. the significance increases 
with the deviation. The dividing line between significance and non- 
significance is, of course, not sharply defined. But significance is 
usually regarded as beginning where the probability of a larger 



112 Theory of Sampling [vi 

deviation is less than 6%. It may be remarked that, while the 
above test may furnish evidence against the hypothesis, it cannot 
prove the hypothesis to be correct. The most it can do in its favour 
is to provide no evidence against it. 

The above argument still holds if, in place of the number of 
successes and its s.e., we employ the proportion of successes in the 
sample and its s.e. The expected value of this proportion in simple 
sampling is p ; and we compare the deviation of the actual proportion, 
from this value, with the s.e., proportion. In some 

cases the value of p in tlie population is not known, but must be 
estimated from tlie sample. The estimate obtained from a large 
sample may bo used without serious error in place of the true value, 
since the s.e. of the projjortion of successes is small when n is large. 

Example. A certain cubical die was thrown 9,000 times, and a 5 or a 6 was 
obtained 3,240 tirntss. On tho assumption of random throwing, do the data 
indicate an unbiastni die? 

On the hyf>othesis of an unbiased die the chance of throwing a 5 or a 6 is 
1/3. Thuap = 1/3 and q = 2/3. The expected number of successes is therefore 
3,000, and the deviation of tlie actual number froio tins value is 240. The 
8.B., €, of the number of successes is 

€ = ^(npq) = V(11000 X I X §) = 10 V20 = 44*72. 

The deviation 240 is nearly 6*4 times this s.e.; and it is therefore moat un- 
bkely to apjiear as a result of simple sampling with p ~ 1/3. We therefore 
conclude that tho die is almost certainly bias<‘d, and that p is not equal to 1/3. 

Tlie estimate of p obtained from tho sample is 3,240/9,000 = 0*36. The 
8.B. of the proportion of successes is then 

e' = V(0-36 X 0*04/9,000) = 0*0050, 

and Se' = 0 01 5. It is therefore most unlikely that the true value of p lies 
outside the range 0*30 ± 0*015. In other words, the true value of p almost 
certainly lies between 0*346 and 0*376. 


46. Comparison of large samples 

Let two populations, and Pg, be tested for the prevalence of a 
certain attribute, by taking from them large simple samples of 
and rig members respectively; and let p^ and p^ be the observed 
proportions of successes in the samples. Is the diflcrence, Pi—p^y 
significant of a real difference between the two populations with 
respect to the given attribute? On the hypothesis that the popula- 



46] Sampling of Attributes 118 

tions are similar in this respect, we may combine the samples to 
estimate the common value of the relative frequency of the occur- 
rence of the attribute in the populations. This estimate is then 




^lPl-^^2Pt 


( 1 ) 


The s.E.’s of the proportions of successes in samples of and 
members are ^J(pqlni) and ^{pqln^) respectively; and, since the 
samples are independent, the variance e* of the difference of these 
proportions is given by 

= (2) 


in consequence of the theorem proved in §§ 10 and 16. 

On the assumption that the populations are similar, the expected 
value of the difference zero, for 


^iPi-Pz) = ^(Pi)-^(Pt) = 0 . 

The sampling distributions of and p^ are approximately normal, 
when 111 and n 2 are large; and the same is true of their difference 
since the samples are independent. Thus the distribution of pi—p, 
is approximately normal, with mean zero and s.d. e. The probability 
that, in simple sampling, the difference Pi—Pz will be numerically 
greater than 3e is therefore very small. The probability that it will 
be greater than 2e is in the neighbourhood of 6 %; and any value 
smaller than this is regarded as not significant, i.e, as providing no 
evidence against the hypothesis. 

Example 1. In a simplo sample of 600 men from a certain large city, 400 
are found to be smokers. In one of 900 from anotlier large city, 450 are 
smokers. Do the data indicate that the cities are significantly different with 
respect to the prevalence of smoking among men ? 

Here p, = 2/3, p, = 1/2, so that Pi ~Pt = 1/6. On the assumption that the 
cities are alike with respect to the prevalence of smoking among men, we have 
as OUT estimate of the common value of p, 

- II 

^ ~ 1500 “ 30’ 


and the variance of the difference of the proportions for the two samples is 





IZ 13 

30 ’‘30 



= 0-000682. 


WlfS 


8 
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Hence e » 0*026. The observed difference is greater than 6«, and is therefore 
higiily significant. Our assumption that the populations are similar is 
therefore almost certainly wrong. 


Next suppose that simple samples, of and n, members, are 
drawn from populations in which the proportions are pi and 
respectively {p^ >p^). Is it likely that the proportions p[ and pi in 
the samples will be such that pj ~pi ^ 0; in other words, is the real 
difference between the populations hkely to be hidden in sampling? 
As before, the distribution of p[—p% is approximately normal for 
large values of and but the mean of the distribution is now 
Pi — P 2 , and its variance, being the sum of the variances of pi and 
Pi, is given by 

(3) 


gi -5 I 


n. 


n. 


In order that the sample value of pj—pi should be negative, its 
deviation from the mean must be on the negative side, and numeri- 
cally greater than Pi — p^* If Pi — P 2 > 3e, the probability of this is 
very small. If, however, Pi — Pa is much less than 2e, such an event 
would not be very unusual. For the particular case in which 
Pi — Pa = 2e, the probability of the event is in or near the interval 
2-2i%. 


Example 2. In two large populations there are 35 and 30 % of fair-haired 
people. Is the difli'erence likely to be revealed by simple samples of 1,600 
and 1,000 respectively from the two populations? 

Here = 0-36 and p, = 0-30, so that Pi — Pj = 0'06. The variemce of the 
difference of the proportions in the samples is 


. (0-35 0-65) (0-3 0 7) ^ 

+ — = 0000362, 
1500 > 1000 


so that e = 0*019. The difference Pi — p* is about 2* Be. The probability that 
the real difference between the populations will be hidden is approximately 
the probability that, for a random value of a normal variate, the deviation 
from the mean will be on the negative side and greater than 2*6 times the 
s.D. Since thisis less than ^ %, it is unlikely that, the difference will be hidden. 


47. Poissonian and Lexian sampling. Samples of varying size 
Consider next a few modifications of the condition of simple 
sampling, beginning with Poisson* s series of trials (cf. § 11, Ex. 3). 
Suppose that, in drawing a sample of attributes of n members, the 
chance of success changes at each drawing. Ijoipg be the probability 
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of success at the tth drawing. Then the expected value of the 
number of successes in the sample is the sum of the expectations 
at the individual drawings; and this is 


where p is the mean of the quantities p^. Also, since the drawings 
are independent, the variance of the number of successes in the 
sample is the sum of the variances of the numbers of successes at 
the separate drawings, so that 


e* = = 'LPi-'LPi- 

If (Tp is the variance of the quantities we may write this 

e* sai np — n(p^ + a^) = npq — na^. (4) 

Thus the variance of the number of successes is less than when the 
probability remains constant and equal to p. Dividing by n* we 
have the variance of the proportion of successes in a Poissonian 
sample 

(6) 

n 


n * 


Consider next a Lexian series of trials* Suppose that, in taking 
N simple samples of attributes of n members each, the probability 
of success varies from one sample to another. Let p^ be the value 
in the ith sample (t = 1, 2, ..., A'). We wish to find the s.e. of the 
number of successes per sample, when the records of all the samples 
are pooled. Let p be the mean value of the probability, so that 
Np — S Pi- Then the expected value of the number of successes in 
the whole series of samples is equal to the sum of the expected values 
for the individual samples, and this is 

2 = nNp, 

Hence the expected value of the number of successes per sample is 
np. To find the variance e* of the number of successes per sample, 
we observe that the mean square deviation from np in the tth 
aampteto + 

Summing for all the JV samples, and equating to we have 
iVe* = n'Z.Piqi-\-n*Z{Pi-p)'‘ 

• Studied by W. Lexis in 1877. 
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Now the lost tern has the value n^N(x\y where a% is the variance of 
the quantities In the other sum we may write 


I^PiQi ^ = Np-N{crl+p^) 

= Npq — Ncr^. 

Substituting these values in the equation, and dividing by N, we 
obtain ^2 ^ npq-{-n{n— l)(rl, (6) 

which is the required variance of the number of successes per sample. 
Dividing by n* we have the variance of the proportion of successes 
per sample, viz. ^ 




n 


n 


/p. 


Both the variances (6) and (7) are greater than in the case of simple 
sampling with constant probabihty p. 

Lastly, we may consider the modification of simple sampling in 
which the probability p of success remains constant, but the size n 
of the sample varies about a mean n with variance To find the 
variance of the number of successes per sample, we observe first 
that the mean number of successes per sample is rip. Then, for 
samples of size w, the mean square deviation of the number of 
successes from np is npq. Consequently the mean square deviation 
from the general mean np is 

npq^-{np — npY. 

The expected value of this mean square is the required variance of 
the number of successes, so that 

= np3^4-i?2cr2. (8) 

We shall make use of this result in a later chapter. 


Sampling op Values of a Variable 

48. Random and simple sampling 

We pass now to the consideration of sampling of values of a 
variable and measurable quantity, such as height, age, yield of 
grain, etc. Each member of the population of individuals, objects 
or experiments provides a value of the variable; and we thus have 
a population of values of the variable, and the frequency distribu- 
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tion determined by it. In drawing a sample of n members from the 
population we are choosing n values of the variable from those of 
the distribution. 

We have already defined random sampling as sampling in which 
each member of the population has the same chance of being chosen. 
In the case of a population of discrete values of the variable x, the ' 
value may occur times. There are then members of the popula- 
tion each equal to x^. Hence the probability of the value x^ in the 
selection of an individual by random sampling is fJN , which is the 
relative frequency of that value in the population. Similarly, if the 
population of values of x has a continuous distribution with relative 
frequency density/(x), the probability that, in the random selection 
of an individual, the value of the variable will fall in the interval 
dx is f(x)dx. Simple sampling is random sampling with the further 
provision that, in the selection of an individual from the population, 
the probabilityofobtaining a valueof the variable within any specified 
range remains constant throughout the sampling. In particular, with 
a population of discrete values of the variable, the probability of ob- 
taining a specified value x^ remains constant during the sampling. 
Thus, in simple sampling, the system of probabilities associated with 
any drawing is inde[)endent of the results of preceding drawings. 

A poi)ulation, whoso distribution is continuous, contains an 
infinite number of values in any finite interval in the range of the 
variable. In drawing a finite random sani[)le from such a po{)ulation, 
the probability associated with any interval remains unchanged, 
and the samiding is therefore simple. Thus a finite random sample 
from a population w hose distribution is continuous is a simple sample. 
It is the common practice to refer to such a sample as a ‘random 
sample If, however, the population contains only a limited number 
of values, the sampling will not be simple unless each value selected 
is returned to the population before the drawing of the next value. 

49. Sampling distributions. Standard errors 

The distribution of the variable in the population has its mean, 
variance, moments of higher order, partition values, etc., which 
are spoken of generally as the parameters of the pof)ulation. Simi- 
larly, each simple sample from the population determines a fre- 
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quency distribution of the variable, from which the mean, variance, 
etc., of the sample may be calculated. These may be regarded as 
estimates of the values of the parameters of the population. Any 
such estimate obtained from the sample is called a statistic. More 
generally any function of the sample values, used fits an estimate of 
a parameter of the population, is Called a statistic. 

When the distribution of the variable x in the population is 
known, it is theoretically possible to determine the probability 
that the estimate z of any parameter, obtained from a simple sample 
of n members, will lie in the interval dz. Let this probability be 
denoted by (f>(z)dz. Then the density (f>(z) determines a probability 
distribution called the sampling distribution of that statistic for 
simple samples of size n. Thus the sampling distribution is a con- 
tinuous distribution, determined by the nature of the population 
and the size of the sample; and the s.d. of the sampling distribution 
is called the standard error (s.e.) of that statistic for samples of n 
members. The sampling distribution is often defined as the dis- 
tribution of the values of the statistic obtained from an infinite 
(or very large) number of simple samples of the given size. This 
alternative way of looking at it may be a help to the student. The 
two definitions bear the same relation to each other as the d priori 
definition of probability and the empirical definition. The reader 
should bear in mind that a sampling distribution is essentially a 
probability distribution. 

Distributions of statistics, for random samples from a normal 
population, will play a prominent part in later chapters. It wets 
shown in § 22 that the distribution of the means of such samples is 
normal; and we know its s.D. In general, for normal populations, or 
populations whose frequency curves are unimodal and only moder- 
ately skew, the sampling distributions of many of the common 
statistics approximate to the normal type as n increases indefinitely; 
so that, for large samples, they possess the property that the prob- 
ability of a sample value of the statistic deviating from its mean by 
more than three times its s.e. is very small. This enables us to apply 
the test of §45 to statistics obtained from large samples; and the 
determination of the s.b. in such cases is a matter of importance. 
Tests appropriate to small samples will be considered in Chapter x. 
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50. Sampling distribution of the mean 

The distribution of the means of random samples of given size, 
from a specified population, is a question of considerable importance. 
Let the sampled population have mean fjL and variance or*. We shall 
first prove by elementary methods that the sampling distribution 
of the mean z, of random samples of size n, has fi for its mean and ' 
cr^/n for its variance. We shall then prove more generally how all 
the moments of the distribution of the mean may be deduced simply 
from those of the population, by means of the properties of the 
cumulative function. 

Reasoning in terms of a population of discrete values, let the rela- 
tive frequency of the value in the population he p^{i = 1, 2, A;). 
Then 

(9) 

and (10) 

In the case of a sample of one member, the distribution of the mean 
is clearly the distribution of the variable in the population. For the 
single value in the sample, associated with probability is the 
mean of the sample. The expected value of the mean of the sample 
is therefore 

E(x,) = =/i. 

Consequently the variance of the distribution of the mean of a 
sample of one is 

as stated. For a sample of n values Zj {j == 1, 2, ...,n), the mean z 
is given by 

nz ^ ^Zj. 

Taking the expected value of each member we have 
£(nx) = EiY. x^) = 2 E{x^) = 2 - n/., 

so that E{x) — fi. (11) 

Thus the mean of the population is the mean of the sampling distribu- 
tion of X, Further, since the values in the sample are independent, 
the sampling variance of their sum, nx, is the sum of the variances 
of the separate values. But each of these values is a sample of one 
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member, whose distribution has variance cr*. Consequently the 
variance of nx is mr*, and its s.d. is <r The s.d. of £ is therefore 
This is the required s.K. of the mean. 

If the distribution of values in the population is continuous, with 
relative frequency density /(x), the relative frequency for the in- 
terval dx \Bf(x)dx\ and this is the probability of a random value 
coming from that interval. To adapt the above argument we have 
only to replace by f(x) dx, x^ by x, and summation with respect 
to % by integration extending over the values in the population. 
Thus (9) and (10) become 

fi = jrf(x)dx 

and 0-2 = J(x — dx. 

Summation with respect to j remains unaltered, since it extends 
only over the n values of the sample. 

Example. A Bample of 000 members is found to have a mean of 3*4 cm. 
Could it l>o reasonably rejjartJed as a simple sample from a large population, 
whose mean is 3-25 cm. and s.d. 2'61 cm.? 

The s.E. of the mean of a simple sample of 000 from such a population is 
2*61/30 = 0*087 cm. The deviation of the mean of the samj)le from tliat of 
the population is 0*15 cm. This deviation is less than twice the s.E. of the 
mean, and is tlien‘fore not signinnint. We conclude tliat the given sample 
might be one drawn from the population specified. 

The various moments of the sampling distribution of x may be 
found very simply* by using the additive property of cumulants, 
proved in § 16. For, since nx = il' follows from this property 
that the cumulative function of rix, being the sum of those of the 
variates is given by 

K{t\ nx) == nK{t\ x) 

= UK 1 1 + UK ^(^12 ! + ! + ..., 

the cumulants /c,. being those of the population, and the second 
argument in brackets denoting the variate whoso cumulative 
function is indicated. But the rth moment of x is obtained from that 


• Of. Fisher. 1929, 1. p. 202. 
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of nx by dividing by n*". Hence the cumulative function of x is found 
by substituting tjn for t in the above expansion. The cumulative 
function of the distribution of the mean is therefore 


X) 


Kit H + ""2 ‘71 “3 71 

n 21 3 ! n® 4 ! 


The rth curaulant for the distribution of the mean of the sample 
is thus found from that of the po[)u]ation by dividing by n'~^. In 
particular, the mean of the' distribution of x is and is therefore 
the same as the mean of the population. The variance of x is 
and the third moment about the mean of the distribution is 

That the sampling distribution of the mean is approximately 
normal for large samples, when the [)opuIation has only moderate 
skewness and excess, also follows from the results of §16. For, 
since the rth cumulant of the distribution of the mean is obtained 
from that of the population by dividing by the skewness of the 
distribution of the mean is 


/ Tl \ 1 

- j sa — (skewness of the population). 

\^ 2 / 

Similarly, the excess of kurtosis of the distribution of the mean 
K Tt^ 1 

~ - (excess of population), 
n** Acl n 

Thus, for large samples from a population of moderate skewness 
and excess, the skewnc.ss and excess of the distribution of x are 
small, and the distribution is approximately normal. 


51. Normal population. Fiducial limits for unknown mean 

Suppose that the population, from which the random sample of 
n values is drawn, is a normal population with mean /i and s.d. <r. 
Then the 8am[)le values are independent normal variates, with the 
same mean and s.d.; and, by the theorem of §22, their mean x is 
normally distributed with mean and s.d. cr/^n. If we know 
but not /4, there is a range of possible values of ji for which the 
observed mean x of the sample is not significant at any specified 
level of probability. In the sampling distribution of the mean, the 
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relative deviation of x from its expected value is the ratio of 5—/^ 
to the s.B. of x\ and this ratio is (x — ji) ^jnja. If then the observed 
value X is not significant at the 5 % level of probability, this 
relative deviation must be less than that which, in a normal dis- 
tribution, is exceeded numerically with a probability of 5 %. Such 
a relative deviation is 1*96. Consequently the observed mean x 
will be not significant provided 


which requires 


o 


<1*96, 


X— l-96(T/-^n < ii<x-v l*9C(r/^n. 


The values xT l*96crl^Jn are called the 95 fiducial limits, or con- 
fidence limits, for the mean of the population corresponding to the 
given sample. They are the limits within which jx must lie, in order 
that the observed sample mean should not be significant at the 
prescribed level of probability. 

Similarly, we define the fiducial limits for other levels of prob- 
ability. Thus, in a normal distribution, the relative deviations 
which are exceeded numerically with probabilities of 2 and 1 % are 
2*33 and 2*58 respectively. Hence the 98 % fiducial limits for the 
mean of the normal population, corresponding to the given sample, 
are x T 2*33<r/^7i; and the 99 % fiducial limits are x T 2*580’/-^. 


Example. Show that, in the example of the preceding section, if the 
population is normal but it^ mean uriknown, the 95 % fiducial limits for the 
mean are 3*23 and 3*57 cm., and the 98 % hducial limits 3*20 and 3*60 cm. 


52. Comparison of the means of two large samples 

Given two independent simple samples, of and members 
respectively, we may wish to examine whether the difference of 
their means may be accounted for by fluctuations of sampling, the 
two samples being regarded as drawn from the same population of 
s.D. O'. The s.E.’s of the means of samples of and members 
from this population are (xl^Jn^ and crj^Jn^ respectively. Hence, on 
the assumption that the samples are independent and drawn from 
this population, the s.B. e of the difference of their means is given by 


6* 



( 12 ) 



123 


52 ] Comparison of Means 

The sampling distribution of this difference has zero for its mean, 
and is approximately normal if rij and are large. Consequently, 
if the observed difference of the means exceeds 3e, it can hardly be 
ascribed to fluctuations of sampling; and our assumption that the 
samples were drawn from the same population is almost certainly 
incorrect. If the difference is greater than 2€, it is regarded as 
significant at the 5 % level of probability. When the variance of the 
population is not known, it may be estimated from the combined 
sample of Tij + rig members, unless the variances of the two samples 
are inconsistent with the assumption that they were drawn from 
the same population (see § 60 below). 

If the two samples are known to have come from different popu- 
lations, with variances erf and crl respectively, we can test by a 
similar procedure whether the two populations may have the same 
mean. The standard errors of the means of samples of and 
members from the two populations are Cily/ni and respec- 

tively; and the s.E. e of their difference is then given by 



(13) 


On. the assumption that the two populations have the same mean, 
the distribution of the difference of the means of the samples has 
zero for its expected value, and is approximately normal for large 
samples. We may therefore test the significance of the difference of 
the sample means, by comparing it with e in the usual manner. As 
before, if the variances of the populations are not known, they may 
be estimated from the large samples. 


Example. A simplo sample of heights of 6,400 Englishmen has a mean of 
67*85 in. and a s.D.of 2-56in,, while a simple sample of heights of 1,600 Austra- 
lians has a mean of 68*65 and a s.d. of 2-52 in. Do the data indicate that 
Australians are on the average taller than Englishmen ? 

The s.B. of the mean of a sample of heights of 6,400 Englishmen is 
2*66/80 = 0*032 in., and that of the mean of a sample of heights of 1,600 
Australians is 2*52/40 = 0*063 in. The s.E. of the difference of the means is 

6 = V[(0 032)*-f. (0*063)*] = ^(0 004993) = 0*07 in. 

The observed difference between the means of the samples is 0*70 in., which is 
10 times its s.B. Hence the data are inconsistent with the assumption that 
the means of the two populations are equal; and we conclude that Australians 
are on the average taller than Englishmen. 
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53, Standard error of a partition value 

Consider the s.E. of a partition value for a large random sample 
of n members, drawn from a continuous population in which the 
relative frequency density of the variable is f{x). Let the partition 
value be that for which p is the fraction of the frequency lying 
above it, and q the fraction below it. Also let Xj, be the partition 
value for the population, and that for a large sample of n 

items. The relative frequency of values above x^^ in the population 
is p. In the sample the relative frequency above Xj^-\~8Xj, is p, and 
above it is p -f 8p. Thus Sp is the relative frequency in the sample 
for the interval Sx^. Now for a large sample 8Xj^ and Sp are small; 
and Sp is, to within infinitesimals of higher order, the relative 
frequency, for the interval Sx^^ in the population. Consequently 

Sp =f{x^)Sxj, = ySxj,, 

where y is the ordinate at for the frequency curve y —f(x) of 
the population. On squaring we have 

W = 

Now y is independent of the sample, and SXj, and Sp vary about 
zero as mean. Hence, on taking expected values of each side, 
we obtain 

E{8x^Y^j^E{$p)\ 


Now E{SXj,)^ is the variance of the sampling distribution of x^^, and 
its square root is the s.E. of this partition value. Similarly, E[SpY 
is the variance of the proportion of successes in n trials, with con- 
stant probability p that the* value of the variable selected will be 
greater than Xj^\ and we know that this variance is pqjn. Con- 
sequently, on taking the square root of both sides of the above 
equation, we obtain the required s.E. e of the partition value* as 


1 Ipq 

y^l n f{Xj,)^J n‘ 


(14) 


If the value of y for the population is not given, it may be estimated 
from the frequency distribution of the large sample. 


See also Kendall, 1940. 4. 
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An important case is that in which the distribution of the variable 
in the population is normal^ with s.D. cr. The value of p is the area 
under the normal curve to the right of the partition value; and the 
table of areas in §21 enables us to read off the value of tJo, The 
table of ordinates for the normal curve in § 20 then gives the corre- 
sponding value of (ry\ and substitution of the values of t/, p, q and n 
in (14) gives the required s.e. For example, in the case of the median 

? = i ^1^=^ = 0-3989. 

Hence the s.b. of the median, for a large sample of n members from 
a normal population, is 

O' 1.25-?! 

0-3989 V » 

Tliis is 26% greater than the s.b. of the mean. For the upper 
quartilejp = the area from the mean to the quartile is 0-26, giving 

xia- = 0-6746, cry = 0-3178. 

The S.B. of a quartile is therefore 


cr 

6-3178 




Example, Prove in the same manner that, for large scunples of n members 
from a normal population of s.D. <7, 

S.K. of Ist and 9th deciles = 1*7 10*/-^;^, 

S.B. of 2nd and 8th deciles = 1*43(7/^, 

S.E. of 3rd and 7th deciles = l*32c7/^, 

S.B. of 4th and 6th deciles = 1*27(7/-^. 
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EXAMPLES VI 

1 . A biased coin w£is thrown 400 times, and heads resulted 240 
times. Find the s.B. of the proportion of heads in 400 throws; and 
deduce that the probability of throwing heads in a single trial 
almost certainly lies between 0*63 and 0*67. 

2. A random sample of 500 pineapples was taken from a large 
consignment, and 65 were found to be bad. Show that the s.B. of 
the proportion of bad ones in a sample of this size is 0*015; and deduce 
that the percentage of bad pineapples in the consignment almost 
certainly lies between 8*5 and 17*5. 

3. In a random sample of 800 adults from the population of a 
certain large city, 600 are found to have dark hair. In a random 
sample of 1,000 adults from the inhabitants of another large city, 
700 are dark-haired. Show that the difference of the proportions of 
dark-haired people is nearly 2*4 times the S.B, of this difference for 
samples of the above sizes. 

4. In two large populations there are 30 and 25 % respectively 
of fair-haired people. Is this difference likely to be hidden in sam- 
ples of 1,200 and 900 respectively from the two populations? (The 
difference, 0*05, in the proportions is more than 2J times the s.B. 
of this difference for such samples. Hence it is unlikely that the real 
difference will be hidden.) 

5. Given that, on the average, 4 % of insured men of age 65 die 
within a year, and that 60 of a particular group of 1,000 such men 
died within a year, show thatf this group cannot be regarded as a 
representative sample, seeing that the actual deviation of the 
proportion of deaths is more than three times the s.B. of the pro- 
portion for samples of this size. 

6. In sampling of attributes a sample is drawn containing an even 
number n of members. In drawing each of the first |n members the 
probability of success is p, and for each of the remaining ones it is 
1— p, the drawings being independent. Show that the expected 
number of successes is ^n, and the variance of the number of suc- 
cesses is npg. 
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7. The sampling distribution of a certain statistic is normal, 
with mean 2-0 and s.e. 1*5. Show that the probability that a simple 
sample will yield a value of the statistic greater than 6*0 is 0-02275; 
and the probability that the value found will lie outside the range 
— 1*0 to 5-0 is 0*0455. 

Also find c if the probability of getting a value of the statistic 
greater than c is 0*07. {Ans, c = 4-213.) 

8. A normal population has a mean of 0-1 and a s.D. of 2-1. Find 
the probability that the mean of a simple sample of 900 members 
will be negative. {Ans, 0*077 nearly.) 

9. A sample of 900 members is found to have a mean of 3*47 cm. 
Can it be reasonably regarded as a simple sample from a large 
population with mean 3*23 cm. and s.D. 2-31 cm.? {Ans, No. The 
deviation x—fi is more than three times the s.e. of the mean.) 

10. The means of simple samples of 1,000 and 2,000 are 67-5 
and 68-0 in. respectively. Can the samples be regarded as drawn 
from the same population of s.D. 2-5 in.? {Ans, No. The difference 
of the means is more than 6 times the s.e. of the difference, which 
is 0-097 in. nearly.) 

1 1 . Show that the s.e. of the 8th decile of a simple sample of 900 
members drawn from a normal population of s.D. 2-5 cm. is 0*12 cm. 
nearly. 

12. Prove that the coefficient of correlation between errors in 
the partition values corresponding to proportions p and p' of the 
frequency lying above them is, for large samples from a continuous 
population, ^J{p'qlpq') in which p>p\ In particular, for the lower 
and upper quartiles this coefficient is 1/3. (Cf. Yule and Kendall, 
1937, 1, p. 385.) 

13. Using the value e =» l*363<r/>y/7i for the s.e, of the lower (or 
upper) quartile in a large sample of n values from a normal popula- 
tion of s.D. cr, find the s.e. of the semi -interquartile range. 

Since the interquartile range is the difference between the upper 
and lower quartiles, and the correlation between errors in these 
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statistics in a large sample is 1/3 (cf. Ex. 12), the s.x. of the inter- 
quartile range is, in virtue of § 32 (46), 

+ — fc^) = 2elyjZ. 

The s.E. of the semi-interquartile range is therefore 
6/^3 = l*3G3(r/V(3n) = 

14. The quantities (t = 1, 2, ...,n) are independent values 
chosen at random, one from each of n populations vrith the same 
mean, and with variances crj. Show that the linear estimate of the 
common mean, which has the least sampling variance, is that 
obtained by weighting the x’s inversely as their variances. Also 
prove that this minimum variance is ///n, where H is the harmonic 
mean of the variances orl. (Cf. Ex. iv, 6.) 

15. From a finite population of N values, with variance a*®, a 
random sample of n values is drawn without replacements. Show 
that the sampling variance of the mean of the sample is 

{N — n)(r^/n{N —1). 

16. Variable Poissonian sampling. The three types of sampling 

considered in § 47 are all included in the following. Suppose that, in 
the drawing of a Poissonian sample, there are various types of 
sampling, each type having its own probability. Let tzj^ bo the 
probability of the Hh type of Poissonian sample, the size of a 
sample of this type, pf^j (j = the probabilities of success 

at the individual drawings in thjs type, so that the expected number 
of sucesses in such a sample is = ILPki- virtue of §47 (4) the 

variance of the number of successes per sample in this type is 

whore o\ is the variance of the Pf^j for the ith type of sampling. The 
expected number of successes per sample when all types are 
possible is 

X = 

k 
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and the variance of the number of successes per sample for all types, 
being the mean square delation of the number of successes per 
sample from x, is given by 

k 

where <r| is the variance of the Xj^ for all t 3 rpe 8 . Hence the general 
formula^ 

6* = 3c + ai-STO*r^i| + n 40 -| 

In the particular case, for example, in which all the have a 
common value, p, we have 

<r* = 0, x^=pn^, x=^pn, al=:p^a*, 

and the above formula becomes 

€* = = npq+p^a*. 

k 

which is § 47 (8). 

17. In a certain population the proportion of members possessing 
a given characteristic is p. Prove that, if p may be varied, the 
probability of obtaining m such members in a simple sample of n 
from the population is greatest when p = m/n. 

18. Prove that, in simple sampling from a population, the 
expected value of the rth moment of the sample about any fixed 
value is equal to the rth moment of the population about that 
value. 

• Cf. Aitken, 1939, 1, pp. 53~4 and Coolidge, 1925, 2, pp. 66-72. 


WMt 
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CHAPTER VU 


STANDARD ERRORS OF STATISTICS 

54. Notation. Variances of population and sample 

Having already considered the s.B.’s of the mean and the partition 
values, we now pass to those of various other statistics. We shall 
argue in terms of a population of discrete values (t == 1, 2, k), 
the relative frequency of the value in the population being 
The argument covers approximately the case in which the values 
of the population are grouped in k classes of small class interval, 
the ith class being centred at the value and having a relative 
firequency p^. The sum of the relative frequencies of all the other 
classes is therefore where as usual p< + g^ » 1. Then, as in §50, 
the mean of the population is given by 

/» =• 2 Pt*t ( 1 ) 

and ita rariance <r* by 

O'* = 2 PiiXi -/t)* = 2 PiX} -p*. (2) 

Consider now a simple sample of n members drawn from this 
population. The sample values fall into the same classes as above; 
that is to say, the values x^ are the same for the sample as for the 
population, but the relative frequencies of the classes vary with the 
sample, fluctuating about the corresponding values for the popula- 
tion. The probability that, in simple sampling, a value drawn will 
belong to the tth class is p^. Hence the frequency of that class, in 
a sample of n members, has mean value np^, sinoe this is the expected 
value of the number of successes in n indepeudent trials with con- 
stant probability p^ of success. Thus 

E(fi) - npt. (3) 

The mean x of the sample is given by 

»* = 2/<*0 ( 4 ) 

and its Taiianoe /S* by 

- 2/<(*<-*)* = 2/«*f-"S*. («) 



181 


54, 55] Estimate of Variance 

We may notice at the outset that the expected value of /S* in 
sampling is not <7* but (n — l)<r®/n. To prove this we observe that 
being the variance of the sample, is the second moment of the 
sample about /t, diminished by Thus 

s* = n)\ 

Consider the expectation of both members of this equation. In the 
first term on the right remains constant during sampling; 

and therefore, in virtue of (3), the expected value of this term is 

71 * 


Further, E{x—/i)^ is the sampling variance of the mean, and is 
therefore equal to cr^/n. Substituting these values we -obtain 



n n 

(6) 

as stated above. 

If then we write 




(7) 

our result is 

E{3^) = cr*. 

(8) 


We say that is an unbiased estimate of <r*, because its mean value 
in sampling is (r^. It is in this sense a better estimate of the popula- 
tion variance than S*. Incidentally also it follows from the above 
argument that the second moment of the sample, about the mean 
of the population, has <r* for its mean value. 

55. Standard errors of class frequencies 

We have seen that, in simple sampling from the above population, 
the probability that a value will fall into the tth class of the sample 
is p^y and that the frequency f^ of this class has mean value 
The deviation of this frequency from its mean value will be denoted 
by The s.b. of /< is the square root of the expected value of 
since is the number of successes in n trials of constant 
probability its variance is np^q^. Thus 

■ W<)* - nPiii - »J><( 1 -Pi)‘ w 
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If is unknown, an estimate of its value derived from the sample 
is fjn\ and, as an approximation to the above sampling variance, 
we have 

( 10 ) 

But this is a fair approximation only if n is large. A better estimate 
is obtained by multiplying this expression by n/(n— 1). To prove 
this we consider the expected value of the second member of (10). 
Since E(J\) is the second moment of /< about zero in its sampling 
distribution, we have 

E(f\) = sampling variance of -f [J5?(/<)]? 

- »/><?<+(»/»<)*• 

Consequently 

-fVn) = npi - -Pi) + npW 

« (n- -Pi) =■ E(Sfi)*. 

Hence the formula 

(10') 

gives a better estimate than (10), since the expected value of the 
second member of (10') is equal to the first member, so that it is an 
unbiased estimate of the sampling variance of /<. For a large sample 
the factor n/(n — 1) is nearly equal to unity, and (10) gives a suffi- 
ciently good estimate. 

56. Covariance of the frequencies in different classes 
Since the sum of the class frequencies in any sample is constant, 
for samples of given size n, it follows that 

SVi-o. (11) 

i 

The covariance between the frequencies and of the fth and jth 
claisses is the covariance, E(SfiSff)^ of their deviations Sf^ and Sf^ 
from their mean values npi and np^. The calculation of this co- 
variance may be associated with a correlation table in the values 
of Sf^ and Sf^. Consider the array in which has a fixed value. 
Bearing (11) in mind we assume that, in all the samples for which 
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56, 57] Class Frequencies 

(J/j has this fixed value, the excess produces an equal deficiency 
which is distributed among the other class frequencies, on the 
average, in proportion to their expected values; that is to say, we 
assume that, in the array with constant the average value 
of S/f is _ / \ 

— ( 12 )- 

In calculating the covariance we may, as proved in §33, replace 
each value of Sf^ in the array by the mean value 5/^ for that array. 
Then ^ 

E(SfiSf,) = 

« -npiPf (13) 

in virtue of (9). This is the required covariance. 

If the values of p^ and p^ are unknown, an approximation is 
obtained by taking them as fjn and fj/n. The second member of 
(13) is then replaced by --/Jjln, A better estimate, however, is 
given by 

(14) 

This may be shown by determining the expected value of the second 
member. Thus, in virtue of § 23 (6), 

and therefore, by (13) and (3), 

WJj) = n^PiPi-nPiP) = n(n- l)PiP, 

= -(n-l)E{SfiSff). 

Consequently, (14) gives an unbiased estimate of the covariance, 
since the expected value of the second member is equal to the co- 
variance. In the case of a large sample, the denominator n — 1 may 
be treated as n. 

57. Standard errors in moments about a fixed value 
It is customary to employ Greek symbols for moments of the 
population, and the corresponding Italics for those of the sample. 
Thus and /i' denote respectively the rth moment of the population 
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about the mean, and about some other specified value, while 
and m' denote the corresponding moments of the sample. For the 
rth moment of the sample about a; =» 0 we have 

Hence, owing to deviations of the class frequencies from their 
mean yalues we have a deviation of the rth moment from 
its mean value, given by 

On squaring both sides we obtain 

nVm;)* = + r 

where S' denotes summation over all integral values of i and j from 
ij 

1 to t, except those for which i = j. Taking expected values of both 
members of this equation we deduce, since the values are the 
same for each sample, 

n^E(Sm'^)^ = S -Pi) - S' ^i^inpiPf 

Consequently, on dividing by n and rearranging, 

the moments and being those of the population. Hence the 
required formula 

(16) 

ft 

If, however, on taking expected values as above we use the estimates 
(10') and (14), we obtain the unbiased estimate of the sampling 
variance of 

( 16 ) 


in terms of the moments of the sample. 
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Since the mean of a distribution is the first moment about x » 0, 
we may deduce the variance of the mean of a sample of n members 
by putting r « 1 in the above. Thus (15) becomes 

= = (17) 

n n 

and (16) gives, in terms of sample moments, 

( 18 ) 

fi— 1 n — 1 II 

in agreement with (8). 

58. Covariance of moments of different orders about a fixed value 
The calculation of the covariance of the gth and rth moments of 
the sample about the fixed value x = 0 is aimilar to the above. With 
the same notation we have 

= 'LfiA, - 2/f*{, 

and therefore nim' = 2 xj nSm[, =» 

On multiplying corresponding sides of these equations we obtain 

Now take the expected value of each member of the equation. 
Then, in virtue of (9) and (13), 

and consequently, on dividing by n and rearranging, 
nE{Sm^Sm^) = 

i i «.! 

“ /‘i+r-(2p<4) (XPi^) = 

Hence the required formula 

E{Sm;,SmD - ^(/ig+r-KK) ( 19 ) 

in terms of the moments of the population. If, however, on taking 
expected values as above, we use the approximations (10') and (14), 
we obtain the unbiased estimate 

E(Sm^Sm;) (»»i+r-«»i»0 

in terms of moments of the sample. 


( 20 ) 
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69. Standard errors of the variance and the standard deviation 
of a iarge sample 

The mean of a sample changes with the sample. Hence we cannot 
use (15) to obtain the sampling variance of the second moment 
about the mean of the sample. In the case of large samples we may 
obtain the required result as follows. The variance of ihe sample 
is connected with the second moment about a: = 0 by the usual 
relation 

= nig — 

Consequently = Sin 2 "^ 2 xSx, 

and therefore, on squaring, 

= (Sm^)^ -- 4^dx6m2'\- ^x\Sx)^y (i) 

where ^^2 == “ 2 (ii) 

n i u i 

Suppose now that the origin of x is taken to coincide with the mean 
of the population. Then x becomes identical with for it is now the 
deviation of the mean of the sample from the mean of the popula- 
tion. If we take expected values of both members of (i) we may 
show that, when n is large, the contributions of the second and third 
terms on the right are small compared with that of the first term. 
By (15), since the origin is now the mean of the population, the 
expected value of is (/A4~/^1 )/w. Also x^{dx)^ is (Sx)^y and its 

expected value is of the order 1/n*, in virtue of (17). Similarly, 
xSxSm^ is {Sx)^Sm 2 y il'S expected value is of the order in 

virtue of (15) and (17). If then n is large, the expectation of the 
first term on the right of (i) is large compared with those of the 
second and third terms, and we have the approximate formula* 

= (21) 

♦ Fisher has shown that, for samples of any size, the exact sampling 
variance of s* (the unbiased estimate of population variance) is 

(/*4~3/iJ)/n4*2/<J/(n~ 1) (Fisher, 1929, 1, p. 206). 

For a normal population this expression has the value 2<ri/(n —1). 
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59, 60] Variance and aj>. 

The s.B. of the variance of the sample is the square root of this 
quantity. In the particular case of a normal population, with s.d. <r, 
we have = 3a^ and /tg = o**; so that, for large samples from a 
normal population, 

E{Sm^'^ = 2a^/n. (22) 


From (21) may be deduced the s.e. of the s.d. of large samples. 

For the variance of the sample we have, with the above notation, 

Wo = /S®, and therefore 

* iwa = 2SSS. 

For large samples the factor S may be taken as equal to the popula- 
tion parameter cXy SS being small. Hence on squaring, and taking 
expected values of both members, we have 

and therefore, in virtue of (21), 

E(SS)-‘.b^. (23) 


In the case of large samples from a normal population this is simply 

(24) 

and the s.B. of the s.d. is, in tliis case, cr/-^(2n). 


60. Comparison of the standard deviations of two large samples 
For two independent large samples of and members from 
the same population, the variance e* of the difference of their 
standard deviations, being tlie sum of the variances of their standard 
deviations, is in virtue of (23), 




4/t, \ni nj 


(25) 


In particular, if the population is normal, this equation takes the 
simple form 

II II 

(26) 


“ \ni nj 


The expected value of the difference of the s.d.’s of two such 
samples is zero. Hence, by comparing the actual difference Si — 8^ 



188 Standard Errors [vii 

with the 8.B. e, we may test in the usual manner the credibility of 
the hypothesis that the samples were drawn from the same popula- 
tion. 

For two large simple samples from different populations, whose 
moments are /I 4 , and ^ 4 , /I, respectively, the S.B. of the difference 
of their standard deviations is given by 

= (27) 

4/^2 n-i 4^2 

And in particular, if the populations are normal with standard 
deviations and o', respectively, 





( 28 ) 


Example. The s.d. of a simple sample of 2,000 members is 6*9 years, and 
that of an independent sample of 2,500 members is 6*1 years. May the 
samples be reasonably regarded as from the same normal population 7 

On the hypothesis that they are from the same normal population, an 
approximate value for its s.d. is 6 years, and the s.R. of the difference of the 
standard deviations of samples of the above sizes is, by (26), 

e = «V{t^ + Wins) = 6(0 0212) = 0-127 year. 

The actual difference of 0*2 year is less than 1 * 6 €. and is therefore not signi- 
ficant at the 5 % level of probability. There is thus no real evidence against 
the hypothesis. 


Sampling from a Bivariate Population 

61. Sampling covariance of the means of the variables 

Suppose now that the population is bivariate, with as relative 
frequency of the pair of values (x^, or of the class with centre 
at that point. The means of x and y in the population are then 

i i 

and the moments of orders q and r in x and y respectively are 

K.r “ =- ZPiiXf-fiy 

the first about the point (0,0), and the second about the mean of 
the population. In particular is the covariance of x and y in 
the population. 
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In a simple sample of n pairs of values, let be the frequency of 
the fth class. Then the values of the moments of the sample are 
given by 

= lLfi{Xi-xY{yi-yY, 

the covariance of x and y in the sample being The argument 
of §§ 55 and 56 still holds, and the formulae 

-Pi)f = -npiPf 

are still valid. 

Corresponding to (17) we may now prove that the covariance of 
the means x, y, in samples from a bivariate population, is For 

nS = ny = 

Hence the deviations Sx, Sy from p, ji* respectively, and the devia- 
tions ^fi from their mean values np^, are connected by 

nSx:^^XiSf^, nSy^ZyiSfo 

and therefore on multiplication 

n^SxSy « + x^y^Sf^S/f. 

i yi 

Taking the expected value of each member we deduce 
n*E($xSy) = 

< ij 

and therefore 

nE{Sxdy) = 'LPtXiyi-{T.PiXt)(T,Pjyt) 

i i i 

Hence the required result 

E(SxSy) = p^^n. (29) 

From this it follows immediately that the coefficient of correlation 
between x and y is equal to the correlation p between x and y in the 
population. For the correlation between x and y is 


covarianee of x and y 
(s.s. of 2) (s.B. of y) 


_ /*i.i 

(0'»/>) O’!®"! 


P. 


orf and being the variances of x and y in the population. 
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We may also prove that the expected value of the covariance of 
X and y in the sample is given by 

E{m^x) = ^Ai.1 (30) 

which corresponds to (6). For, by the known relation, § 23 (5), 


Tv 


Taking the expected value of each member we have, in virtue of (29), 

TU fv 


S -/») (y< 1 


as required. 




62. Variance and covariance of moments about a fixed point 
The moment of the sample about the fixed point (0, 0), of order 
qinx and r in y, is given by 

and therefore, with the usual notation, 

Squaring both sides, and takingexpected values as in § 57, we deduce 

l i,i 

and therefore 

= (31) 

in terms of the moments of the population. This is the sampling 
variance of m' K, in taking expected values, we use the approxi- 
mations (10') and (14), we obtain the unbiased estimate 

in terms of the moments of the sample. 


( 32 ) 
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62 - 64 ] sjE, of Covariance 

Similarly, we may find the sampling covariance of the moments 
m' and m' ^ about (0,0), by multiplying together the expressions 
for Srn^ ^ and and taking expected values of both sides. Pro- 
ceeding as above we obtain the result 

E{Smg 

n 

In particular by putting r = 5 = 0 and q — t == 1, and observing 
that = /4, /Iq i = we find again the covariance of the means 

E{SxSy) = ~ 

As in other cases we may deduce the unbiased estimate corre- 
sponding to (33), with sample moments and denominatorn— 1. 

63. Standard error of the covariance of a large sample 

By argument similar to that of §59 we may find the s.e. of the 
covariance of a large sample from a bivariate population. For, with 
the usual notation, since 

^ 1.1 = 

we have == Sm[^i — ySx — xSy, 

If now we take the origin at the mean of the population, x becomes 
identical with Sx and y with Sy, Consequently, if we retain only the 
principal terms we have, on squaring the last equation and. taking 
expected values, 

E(Sm^,)^ = E(Sm[^,)\ 

and, since the origin is at the mean of the population, this is, in 
virtue of (31), 

E{Smi^i)^ =s - 

giving the square of the s.e. of the covariance of the sample. 

64. Standard error of the coefficient of correlation 

The S.E. of the coefficient of correlation, r, for large samples of 
n pairs from a bivariate population, may be found as foUows.* 


♦ Cf. Bowley, 1920, 2, pp. 422-3. 
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Omitting the comma between the two subscripts in the moment 
symbols we have, by the usual formula, 

and therefore, by logarithmic differentiation, 

^r ^ ^ 1 ^^20 _ ^ ^^02 /• V 

r “ mil 2 2 mo* * 

If the origin is taken at the mean of the population, so that x and 
y are identical with Sx and Sy, and we retain only small quantities 
of the first order, we find as in the preceding section 

n i 

Similarly, from the relations 

^20 = "^20 - ^02 = ^ot - 

we find, on retaining only the principal terms, 

7l 

a»no, = ^ T.yWf 

fl 

Now substitute these values in (i). Then, since n is large and we are 
retaining only small quantities of the first order, we may replace 
the denominators in (i) by the moments and correlation coefficient 
p for the population. Thus 

P ^\Pii 2/tjsa 2/iJ n 

where F(x^, y^) denotes the expression in brackets. Squaring both 
sides we have 

S y,) ^/i]* + ^, r y,) 

and therefore, on taking expected values, 

E{itr)* = ^ j^i: F*{xt,yt)pi(l -p,) - g J'(x<.y,) F(Xf, y,)p,p,J 
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Now the second sum on the right is equal to zero. For 
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On substituting the value of F^(x^,y^) we deduce the sampling 
variance of r in the form 


E(Sr\^ =a I I /^o€ /Hi Hz I P'tt n /jgv 

nLA*! 4/i|o 4;tJ, ^ 

This result assumes a very simple form in the case of a bivariate 
normal population. The parameter values are then 

/tM = (l + 2p»)(rf<7-|, /*io = o'l = 

/*« = 1*0* ~ /*»! ~ ^pOitTj, /*!, = SpCj^Orl, 

and on substituting these values we obtain 




Consequently 


8.E. of r 


“ 4n • 


( 36 ) 


These formulae hold for large values of n. Their application, 
however, is limited by the fact that the sampling distribution of r 
is not even approximately normal when r is fairly large. The s.E. 
test for r should therefore be applied only for large samples and 
moderate values of r. A much more useful test, which is applicable 
to small or large samples, will be considered in Chapter x. 
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EXAMPLES Vn 

1. In a sample of 10,000 from a normal population the s.n. is 
2*62 cm. Show that the s.e. of this quantity is 0*018 cm. nearly, and 
hence that the s.d. of the population almost certainly lies between 
2*46 and 2*68 cm. 

2. A random sample of 6,400 members from a certain population 
has a S.D. of 6*80 years, and a fourth moment of 3298 yr.* Show that 
three times the s.e. of the s.d. is 0*094 nearly, and hence that the 
S.D. of the population almost certainly lies between 6*7 and 6*9 years. 

3 . The S.D. of a random sample of 1,000 members is 6*9 years, 
and that of an independent sample of 900 members is 6*1 years. 
Show that the samples may reasonably be regarded as drawn from 
equally variable normal populations, since the difference of their 
standard deviations is very little greater than its s.e. 

4. Standard error of the coefficient of variation. By definition this 
coefficient is 7 = iS/x = yirnjx. Logarithmic differentiation gives 

dV __ Sx 
V 2m2 X * 

Squaring both sides and taking expected values show that, for 
large samples 

It can be shown that, for samples from a normal population, 
E{SxSm^) == 0. Show that in this case the above result leads to 

S.E. of 7 = 7V[(l4'272)/2n]. 

5. Standard error of a moment about the mean of a large sample. 
Proceeding as in § 59 we have 

«>«« = 2 

* S - qxZ A~'fi + • • •. 

Sm^ — — .... 


and therefore 
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If the origin is taken at the mean of the population, x becomes 
identical with 8x\ and, as we need retain only the terms of lowest 
order, we neglect those after the first two. On squaring we may 
write the result 

(Srrtq)^ = + 

and on taking expected values of both sides we obtain, in virtue 
of (15) and (19), the origin being the mean of the population, 

it 

The square root of this quantity is the s.b. of 

6. Deduce that, for large samples from a normal population of 

s.D. O', the s.E.’s of m3 and aiTeor^^{6/n) and cr* ^ (96 /n) respectively. 

7. A random sample of 3,600 pairs of values from a bivariate 
normal population showed a correlation coefficient of 0*45. Find 
limits to the correlation in the population. 

Since the sample is large we may take 0*45 as an approximate 
value for p. The s.b. of r is then (1 — 0*2025)/60 = 0 0133, so that 
3e = 0*04 nearly. The coefficient of correlation in the population 
almost certainly hes between 0*41 and 0*49. 

8. A random sample of 2,500 pairs of values from a bivariate 
normal population showed a correlation of 0* 1 . Is this really signifi- 
cant of correlation in the population? 

On the assumption that the population is uncorrelated we have 
the S.B. of r = 1/50 = 0*02. The actual value is 5 times this s.b., 
and is therefore significant. 

Show that the above correlation of 0*1 would not be significant 
in a sample of less than 400 pairs. 

9. By means of the result of Examples vi, 18 (p. 129) deduce* 
the formulae (15) of §57 and (19) of §58. 

• Cf. Kendall. 1943, 2, pp. 205-6. 
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CHAPTER VIII 


BETA AND GAMMA DISTRIBUTIONS 


65. Beta and Gamma Functions 

For the benefit of the student who is not familiar with them, we 
shall first prove the elementary properties of the Beta and Gamma 
functions. The integral 

r(n) = j e~^x^~^dx ( 1 ) 

converges if n is positive. It is a function of n called the Gamma 
Function, Clearly 

r(l) = J“c-*da:= 1 . ( 2 ) 

Also, if n — 1 is positive, we have on integration by parts 

— J + J (n — 1 ) dx^ 



so that r(n) = {n— l)r(n — 1). 


(3) 


Hence, if n is a ijositive integer. 


r(n) = (n-l)(7i-2)...2.1.r(l) = (w-1)!. (4) 

On account of the property expressed by (3) and (4), r{n) is often 
denoted by (n — 1)! whether n is integral or not. Algo, on writing 
in place of a; in (1), we have the alternative formula 


r{n) = 2J x^”-^exp(’-x^)dx. 


(5) 


And, by an obvious substitution, it is easily verified that, if a is 
positive, 


j: 


e-<uxn-idx = a-'*r{n). 


( 0 ) 
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The integral 


Beta Fwnction 


B{m, = J 1 — dx 
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( 7 ) 


also converges if m and n are positive. It is a function of m and n 
called the Beta Function, That it is symmetrical in m and n is 
easily shown by the substitution 2 = \ —x. Then (7) becomes 

B{myn) = J = B{n,m), 

Further, on substituting x = sin^^ in (7) we obtain 

rin 

B{myn) = 2 8in^"*~^(9 cos^^'^'^Oddy 

and therefore in particular 




dO = TTy 


( 8 ) 

( 9 ) 


5(i.i) = 2| 

while from (7) it is obvious that 

^(1,1)= 1. (10) 

Lastly the substitution x = 1/(1 + 2 /) in (7) leads to an important 
edtemative definition, viz, 

if^~^dy 


B(m 


rao 


(1 + 2 /) 


m+n > 


( 11 ) 


and in this integral m and n may be interchanged, in virtue of the 
symmetry of the function. 

Example. Show that p 

B{m, n) = - dx. 

This may be deduced from ( 1 1 ) by dividing the range of integration into two 
parts, 0 to 1 and 1 to 00 , and putting y = Ijx in the integration over the 
second part. 

66. Relation between the two functions 

That the Beta and Gamma functions are connected by the relation 

r{m) r(n) 
r(m + n) 

may be proved as follows. Consider the integrals 


BintyU) = 


( 12 ) 


» 2 J a:*"*-! exp ( — **) dx, /^ = 2 exp ( - y*) dy. 
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whose limiting v;*,lues as a tends to infinity are r{m) and r{n) 
respectively, in virtue of (6). Then 


/jig = 4 r dx\ exp( — — 

Jo Jo 

or, on changing to polar coordinates, 

/i /a = 4 JJ exp ( — f *) r 2 m+ 2 n-i cos*"*”^ 6 sin^^-^ 6 drdO, 

the integration extending over the square OACB (see Fig. 4, p. 66). 
Since the integrand is positive, this integral is intermediate in value 
between the integrals of the same function extended over the 
quadrant OAB, of radius a, and the quadrant OPQ, of radius a ^2. 
Hence lies in value between 

riff 

4 cos^"*“^ 0 sin^"""^ OdO I exp ( — r^) dr, 

Jo Jo 

and the corresponding integral with 0 and a ^2 as the limits for r. 
But, aj3 a tends to infinity, each of these integrals tends to the limit 
B(m, n) r{m + n), while and /g tend to the limits r{m) and r(n) 
respectively. Consequently 

B{m,n)r{m + n) == r(m)r{n) 

as stated in (12). 

Putting m = n = i in this result we have, in virtue of (2) and (9), 

n = m)ra). 

Consequently Fd) = ^ (13) 


or 


roo g— ac ' 

--r-dx — Jn. 
Jo ^ 


By writing x* in place of x in this integral, or by putting n = ^ in 
(6), we deduce 

/•oo 

(14) 


J exp(— x®)<ia; = ^^jrr. 


Example 1. Show that, if m is a positive integer. 


r(»n + i) = 


2m — 1 2m — 3 
~2 
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Example 2. Show that, if m and n are positive integers, 

(m- l)!(n- 1)! 


B(m,n) ; 


(m + n— 1)! 


In particular, i3(l,n) = 1/n. 

Example 3. Show that 

B{m 4- l,n)l B{m, n) = ml{m-\-n) 
and B(m + 2,n)IB{m,n) = m(m4- l)/(m + n) (m4-n+ 1). 

Example 4. Show that 

B{m 4- 2, n — 2)/B{m, n) ~ m{m 4- 1 )/(n - 1 ) (n — 2). 
Example 5. Show that 


r (a — 

J 0 


^dx = B{m,n), 
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67. Gamma distribution and Gamma variates 


In virtue of (1) a continuous variable x, which is distributed 
with probability density 


<f>(x) = 


jW 


(15) 


throughout the range 0 to co, is called a Oamma variate with para- 
meter Z; and its distribution is a Ga7nma distribution * The factor 
lir(l) ensures that the integral of <^{x) over the whole range of 
values of .r is unity. The reader should sketch the probability curve 
y — <fi(x) for the distribution. He will find that it is asymptotic to 
the x-axis and that, if Z > 1 , it has a mode at a: = Z — 1 . If Z > 2 it also 
touches the x-axis at the origin; while, if 1 < Z < 2, it is tangent to the 
y-axis at that point. If, however, 0<Z< 1, the curve is asymptotic 
to both axes. 

The expected vahte of the variate in the distribution is given by 



® ^ = I 

"ni) r{l) 


(16) 


The second moment about x = 0 is similarly 


/4 = JJ x^<j>{x)dx = r(Z+ 2)/r(Z) = i{i+ 1). 


* This belongs to Karl Pearson’s Type III. See Kendall, 1943, 2, pp. 137-43. 
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Hence the variance is given by 

<r» = i(i+l)-Z* = Z. (17) 

Example. Show that the rth moment about a; = 0 is 
/i; = i(Z4l)(l+2)...(Z+r-l), 
and deduce that the third moment about the mean is 2Z. 


An important example of a Gamma variate is associated with the 
normal distribution. If a; is normally distributed with mean a and 
8.D. <r, the probability that a random value of the variate will fall 
in the interval dx is 

= (18) 
Let u be defined by 


so that, as x varies from — oo to + 00 , 1 ^ varies from + 00 to 0 and then 
from 0 to + 00 . For values of x between a and + ao. 


and 
so that 


aj— a « cr^{2u) 
dx — <rdulAj{2u)f 


dP 


2^n • 


But the probability that u falls in the interval du is double tliis, 
since there is an equal probability that u wiU fall in this interval 
when x lies between — cx) and a. Consequently the probability 
differential for the variate u is 

- du 

^ — mr- 


so that u is a Gamma variate with parameter We may therefore 
state 


Theorem I. // x is normally distributed with mean a and standard 
deviation <r, then ^{x — a)^lcr^ is a Oamma variate with parameter 

A Gamma variate with parameter I may be referred to briefly as 
a y{J) variate, the symbol being used adjectively.* 


♦ The objection to describing it as a r(l) variate is the additional meaning 

thus given to the symbol r(Z). 
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68. Sum of independent Gamma variates 

The moments of the Gamma distribution, and the distribution 
of the sum of independent Gamma variates, may be deduced from 
the moment generating function. The m.g.f. of a y(l) variate with 
respect to the origin is given by 


../(o-j; 


00 qIx^-x^-1 


dx 


-j; 


0 m 


’00 

~wr 


dx = 


( 1 - 0 ' 


(19) 


and the cumulative function is therefore 


K{t) = — nog(i— 

« + + ( 20 ) 

Thus the mean of the distribution is and the variance also I, 
while the other cumulants are 

^3 = 2!Z, /c, = (r-l)!J. (21) 

Suppose now that x and y are independent Gamma variates with 
parameters I and m respectively. Then the m.g.f. of their sum, being 
equal to the product of their m.g.f.’s, is (1 — But this is the 
m.g.f. of a variate. We thus have 

Theorem II. The sum of two independent Oamma variates, with 
parameters I and m, is a Gamma variate with parameter l + m. 

The converse of this theorem is almost equally important. It 
may be stated: 

Theorem III. If the sum of two independent positive variates is a 
Oamma variate with parameter l-\-m, and one of them is a Oamma 
variate with parameter I, then the other is a Gamma variate with 
parameter m. 

For, if M(t) denotes the m.g.f. of the last variate, we have, on 
equating the m.g.f. of the sum to the product of those of its com- 
ponents, 

whence M{t) = (1 — 

and the second component is therefore a y(m) variate. 
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On account of the importance of these theorems we shall prove 
Theorem II from first principles. Let x and y be independent y{l) 
and y{m) variates, and \etz ^ x + y. Suppose first that y has a fixed 
value in the interval dy. Then dz = dx. The probability that a random 
value of X will fall in the interval dx is 

d/p = e-^xf-^dxir{l), 

and therefore "the probability that, for a given value of y, z will lie 
in the interval dz is 

dp = e~<’^~^^{z — yy-^dzir{l). 


But the chance that y will have a value in the interval dy is 
dp' = e-^^y^-^dyir{m). 

The probability that simultaneously z will he in the interval dz 
and y in the interval dy is the product dpdp' , By integration we 
then have the probability that, for any value of y, z will lie in the 
interval dz as 

o—zdy fa 

To evaluate the integral put y = zty so that dy = zdt. Since z is the 
sum of the positive variates x and y, it is never less than y, so that 
t lies within the range 0 to 1 , Consequently 


dP 


dz 

lW(mj 





rmW) 




e-^z^^^-^dz 


and 2 is therefore a y{l -f- m) variate as stated. 

Repeated application of this theorem shows that the sum of n 
independent Gamma variates with parameters (i = 1, 2, ...,n) is 
a Gamma variate with parameter S m^. In particular, in virtue of 
Theorem I, we may state 

Theorem IV. If x^ (t = 1,2, ..,,n) are n independent variates y 
normally distributed about a common mean zero with standard devia- 
tions (Tiy and Iben Jx* ^ ^ Gamma variate with para- 

I 

meter ^n. 
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69. Beta distribution of the first kind 

In virtue of the first definition, (7), of the Beta function we shall 
say that a continuous variate x, which is distributed with prob- 


ability density 


^{x) = 




' B(lm) ^ ' 

throughout the range of values 0 to 1, is a Beta variate of the first 
kind with parameters I and m; and its distribution is a Beta distribu- 
tion of the first kind* Such> variate may be referred to briefly as a 
m) variate. The reader should sketch the probability curve for 
the distribution, distinguishing the different cases. He will find that, 
if I and m are each greater than 1, there is a modal value 

(;-l)/(Z + m-2). 

If Z > 2 the curve touches the a;-axis at the origin; wliile, if 1 < Z < 2, 
it is tangent to the ^/-axis at that point. If, however, 0 < Z< 1, the 
curve is asymptotic to the y-axis. Similar remarks hold for the 
shape of the curve near a: = 1, according to the value of m. 

The mean value of x is given by 

+ ^ / 00 \ 
^“Jo ~ l + m' ^ 

The second moment /I 2 about a; = 0 is similarly 
f J5(Z -f- 2, 1) 

B(lym) (Z + m) (Z4-m+ 1)* 

From those it follows tliat the variance is 


(Z + m)2(Z + m+l)* ^ ^ 

Corresponding to Theorem II there is a fundamental theorem 
which may be stated: 

Theorem V. If x and y are independent Oamma variates with 
parameters I and m respectively, the quotient xj{x + y) is a Beta variate 
of the first kind with parameters I and m. 

This may be proved fromi first principles. If we write 

X yz 

z = , then X = , 

x+y 1-2 

• Tliis belongs to Karl Pearson’s Typo !• 
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and since x and y are both positive, the range of z is from 0 to 1. 
Suppose first that y has a fixed value in the interval dy. Then 

dx — ydzl(l--z)^. 

The probability that, for this value of y, x will lie in the interval dx 
and therefore z in the interval dz is 


d/p = 


e-^af^^dx _ I I yz ( yz \ ydz 

m ~ W) 1 1 I “ w 


But the chance that y will have a value in the interval dy is 


dp' = 


r(m) 


The probability that simultaneously z will lie in the interval dz and 
y in the interval dy is the product dp dp'. By integration we then find 
the probability that, for any value of y, z will lie in the interval dz as 

To evaluate the integral put y — {l--z)t. Then the limits for t are 
0 and 00 , and we obtain 

-- z)^-^ r(l-\-m)dz _ —z)^~^dz 

showing that z is a m) variate as stated. 

We may remark in passing that, if z is a variate and 

v = 1 — 2 ;, then v is a 1) variate. For the range of v is also from 
0 to 1, and | dv ] = \dz\. Expressing the probability differential for 
z in terms of v and dv, we obtain 


dP = \ — dvj 

which shows that v is a 1) variate. And the reader will observe 
that dP depends on the magnitude | dv | of the interval dv, but not 
upon the sign of dvjdz. 


70. Alternative proof of theorems 

A combined proof of Theorems II and V may be given more 
simply as follows.* As before let x and y be independent y{l) and 

* This proof is given by Sawkins, 1940, 3, p. 212. 
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y(m) variates respectively. Then the probability that a random 
value of X will fall in the interval dx and, at the same time, a random 
value of y will fall in the interval dy is 

Now introduce the new variables 


u^x-\-y, e = a:/(a; + y) 


so that X = uv, y = u{l — v). 

Theu as x and y range from 0 to oo, u ranges from 0 to oo and v from 
0 to 1. Also 

9(a:.y) _ 

d(u,v) 

From the above expression fpr the probability differential dP it 
follows that the probability density in the joint distribution of 
X and y is 

"TW/YmT "" W)T(m) • 


But the area of the element of the a;y-plane bounded by the curves 
along which u has the values u and u + du respectively, and those 
along which v has the values v and v + dv is 


3(*.y) 


d{u, v) 


dudv =s ududv. 


Hence the probability that, in a random selection of x and y, the 
representative point {x, y) will fall in this element of area is 


dp 




r(l + m) 


B{1, m) 


( 25 ) 


Since this is the probability that simultaneously u will fall in the 
interval du and v in the interval dv it follows that these variates 
are independent, and that 14 is a y(I + m) variate and v a 
variate, as stated in the above theorems. 
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71. Product of a m) variate and a y(/+ m) variate 

We may now prove an important property, suggested by Theorem 
V, which may be stated: 

Theorem VI. The product of a m) variate arid an independent 
y(l + m) variate is a y{l) variate. 

Let V be the variate and u the y{l-\-in) variate, and let 

z = uv. The probability differential for v is 



Hence, for a fixed value of u in the interval du, the probability that 
z will fall in the interval dz is 


^ 1 z\^-^dz 

^ Jl{l,m)\u) \ v) u' 


Multiply this by the })robabiiity that a random value of u will fall 
in the interval du^ and integrate over the range of u from 2 to oo 
(since z^u). Thus we have the probability that, for any value of 
u, z will lie in the interval dz as 

z^-'^dz f® z\^~^du 

dP = - 1 1 - “7 

= TV n - z)”*"* 


To evaluate the integral put u — z — zt. Then t ranges from 0 to 00, 
and we have for the probability differential of 2, 

, ^ e~-J~'^ dz r® ^ , e~^zf~'^ dz 

= rnVrr\ e-^zn^^~^dt = — t^TT" * 

I (1)1 Jo r(i) 

Consequently 2 is a y(l) variate' as stated. 


72. Beta distribution of the second kind 


In agreement with the alternative definition of B{m, n) contained 
in ( 1 1 ) we may define a Beta variate of the second kind, with positive 
parameters I and m, as a continuous variate x which is distributed with 


probability density 


"" JS(i,m)(l +*)»+»• 


(27) 


throughout the ranges; = Otox = 00. Such a variate may be referred 
to briefly as a variate. Its distribution is a Beta distribution 
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of the second kind,"^ It is important to remember that, while the 
values of a Beta variate of the first kind are from 0 to 1, those of a 
Beta variate of the second kind are from 0 to oo. The reader should 
sketch the difierent forms of the probability curve for the latter 
distribution. He will observe a considerable resemblance to the 
case of a Gamma distribution. If J > 1 there is a mode at 


a; = (Z~l)/(m+l). (28) 

The curve is asymptotic to the a;-axis; and, if Z > 2, it touches this 
axis also at the origin. If 1 < Z < 2 the curve touches the 2 /-axis at 
the origin; and if 0 < Z < 1 the curve is asymptotic to both axes. 
When m> 1 the mean value of the variate is given by 






B(l,in) TO— 1* 
Also, if TO > 2, the second moment aljout x = 0 is 

f” x'+irfx B{l + 2,m-2) 1(1+1) 


(29) 


' = ^ r 

5 ( 1 , TO) J 0 ( 1 +*)'+”* 
Consequently the variance is 
Z(Z+1) 


B(l,m) 


(m— 1) (m — 2)’ 


r2 = 


Z(Z + m— 1) 


(m--l)(m — 2) (m— 1)2 (m— l)2(m — 2)* 


(30) 


We may observe that, if a; is a Beta variate of the second kind, 
its reciprocal is a variate of the same kind with parameters inter- 
changed. For, if a; is a variate, its probability density is 

given by (27). If then we put a; = 1/y, we have |cZa;| = \ dy\ly^\ 
and therefore, since y lies in the interval dy when x lies in the 
interval dxy the probability of this is 

(1 (dyly^) y”^~^ dy 

“ £(l,TO)(l + l/y)'+'» ~ B(l,TO)(l+y)'+»*' 

Consequently y is a 0 variate. 


♦ This belongs to Karl Pearson’s Type VI. 
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An important relation between the tw'o kinds of Beta variates 
is expressed by 

Thbobem Vn. To each Beta variate of the first kind corresponds a 
pair of Beta variates of (he second kind; and conversely. 

Let t; be a m) variate, so that its probability differential is 


and let w be defined 


dP = 


B{1, m) 


by 


V = 1/(1 -^w) 


(31) 

(32) 


or its equivalent w = (\— v)lv. 

Then \dv\ = | |/(1 and by substitution we have the prob- 

ability differential of w as 




vr 




dw 


B(l,m) (I 


(33) 


Since the range of w is from 0 to oo, it foUows that w? is a P^(m, 1) 
variate. Its reciprocal is therefore a p^(l^ m) variate, and the first 
part of the theorem is proved. 

Conversely, given that tl? is a ^^(myl) variate with probability 
differential (33), we may define a variate v by means of (32). Sub- 
stitution then shows that the probability differential of t; is given by 
(31); and since v ranges from 0 to 1, it is a Pi(hm) variate. The 
variable 1 — r is therefore a Pi(m, 1) variate, and the second part of 
the theorem is proved. 


73. Quotient of independent Gamma variates 

(Corresponding to Theorem V, according to which a Beta variate 
of the first kind is determined by two independent Gamma variates, 
we have 

Theorem Vin. The qvotient of two independent Qamma variateSy 
with parameters I and m^isa ^) variate. 

Let X and y be independent Gamma variates with parameters I 
and m respectively, and let v = y/(y -h x). Then, in virtue of Theorem 
V, V is a y?i(m, 1) variate. But, if ti; is the quotient x/y, clearly 

1 1 

^ “ l+x/y “ 1+u?* 

BO that u; is a m) variate as stated. 
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The theorem may also be proved by the method of § 70, which 
leads to the additional information that the sum x -f y is distributed 
independently of the quotient zjy. For if 
« = x + v^xjy 
d(u,v) x-{-y (l-fv)* 


we have 


and therefore 


a(x, y) 


u 


u 


(1+v) 




Since the probability that x and y fall simultaneously in the 
intervals dx and dy is 

, dxdy 

dp T{l)r(m) ’ 

the probability density for their joint distribution is 

r(i) rim) “ T\iYr(m){i-\-vY-^-^^ 

Consequently the probability that, in a random choice of x and y, 
the representative point (x^y) will fall in the area bounded by the 
curves u, u-i-du, v,v + dv is 

v^'^^dv 

r(l -f m) ’ B(l, 7n) ( 1 -f- ' 

Since this is the probability that u and v will fall simultaneously in 
the intervals du and dv respectively, it follows that u is a y(i + m) 
variate, and v a variate, and also that these variates are 

independent. 

Example 1 . Cauchy's distribution. The distribution of the quotieiit of two 
independent standard normal variates follows from the above theorem. For 
if z is this quotient, is the quotient of two. independent Gamma variates, 
each of parameter J. Consequently 2* is a i) variate, with probability 
differential / 1 



7r(i4-g*)>*' 

The distribution of z follows immediately from this. For, since tho range of 
2 is from — 00 to -f cx) while that of 2* is from 0 to +00, the probability differ- 
ential of 2 is 

7r( 1+ g®) * 

the factor 2 disappearing, since the integral from — 00 to -f- 00 must be equal 
to unity. This distribution. is associated with the name of Cauchy. 
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Example 2. Show that, if x and y are independent normal variates with 
means m,, mj and variances <rj, respectively, the quotient 

z = (x-mi)l(y-Tn,) 


conforms to the distribution 


with a range — oo to + cx). 


dp = 


(T 1 0’j cfe 
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EXAMPLES VIII 

1 . We shall see later (cf. § 82) that, if r is the coefficient of corre- 
lation between the variates in a random sample of n pairs of values 
from an uncorrelated bivariate normal population, then is a 

variate. Hence show that the mean value of for 
such samples is l/(n— 1), and the s.e. of r therefore 1). Also 

show that the probability differential of r is 

2. Quotient of independent Oamma variates. Theorem VIII may 
be proved by direct integration, as in the cases of Theorems II, V 
and VI. Let z = x/y, where x and y are independent y(Z) and y(m) 
variates. Then for a fixed value of y in the interval dy, dx = ydz, 
and the probability that z will^lie in the interval dz is 

dp = e-y^{yzy-^ydzir{l). 

Consequently the probability that, for any value of y, z will lie in 
the interval dz is 

dP = z'-idzjV+”-ie-*'»+»)dy/r(i)r(m) 

7f~^dz 

(1 

showing that z is a m) variate as required. 
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3. Show that, for the y(I) distribution, 

(mean— mode)/cr = Ifyjl = 

Hence some writers prefer to jijcr^ as a definition of skewness. 

Show that the excess of kurtosis of tlie distribution is 6/Z. 

4. Show that the mean value of the positive square root of a 
y(l) variate is r{l + |)//X/). Hence prove that the mean deviation of 
a normal variate £ix>m its mean is a ^{2 In). 

5. Prove that the mean value of the positive square root of a 

/3i(l,m) variate is r(l-^ r(l-^Tn)ir(l) + Hence, using 

the distribution of r* given in Ex. 1, for samples from an uncorre- 
lated bivariate normal population, show that the mean value of 
lr| i8/’(i(n-l))/r(in)>. 

6. A simple sample of n values is drawn from a population with a 
y{l) distribution. Show that, if x is the mean of the sample, nx is 
a y{nl) variate; and deduce that E(x) ==- 1, and that the sampling 
variance of x is l/n. 

7. A simple sample of n values is drawn from a population with 
the exponential distribution whose probability density is ae~^, 
(O^x). Show that, if x is the mean of the sample, nox is a y{n) 
variate; and deduce that ^(x) = 1/a, andthatthes.K.ofxis l/(a-^). 

8. Show that, if t; is the square of a y{l} variate, its probabihty 
differential is dp = 

9. Show that, if x^ (i = 1 ,..., n) are n independent Gamma variates 
with parameters m^, any homogeneous function E(Xi, ...,x,^) of 
these variates, of degree 0, is distributed independently of the sum 
Sx<. (Cf. Pitman, 1937, 8, pp. 216-17.) 

i 

This may be done by considering the m.g.f. of the simultaneous 
distribution of F and 2] As explained in § 31 this m.gX 

X exp (< 1 S *t + *1 

where I/O -■ I\mx) ... r{m^). 


11 
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Substituting = y^ we have 

Jo Jo 

X exp (<a F(^i, .... y„)) di/i . . . 
since i?* is homogeneous of degree zero. Thus 

M{t^, t^) = (function of t^) (function of 
and the variates F and 2 are therefore independent. 

10. Given that the incomplete Beta function B^{1, m) is defined by 

and that /^(i, m) = B^{1, m)IB{U w), prove the relations 
Ixih m)=i-/i_^(m, 1) 


and /*(i+ 1, m+l)^zl^{l, m+ 1) + (1 -x) /a.(/+ 1, m). 

11. Simultaneous sampling distribution of the mean and the 
variance. The independence of the sampling distributions of the 
mean x and the variance of a random sample of n values from a 

normal population, was proved by Fisher from the simultaneous 
sampling distribution of these statistics. The probability that the 
n values of the sample will fall in the respective intervals dx ^, . . 
is, by the theorem of compound probability, 

dp » (<rV(27r))-»exp j^-£(x^-/t)*/2o-*Jrfa:idx,...da:„, 

the n values being chosen independently from the normal population 
of mean ft and variance cr*. But 

== i;(x<-^)*+n(jc~/^)* = n(5c-/4)*, 

so that 

d/p » (o’4(^n)y^ exp [ - n(« --/4)*/2(r*] exp ( - n5*/2cr*) dx^... dx^. 
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Fisher proved by geometrical reasoning* that this probability 
differential is expressible as 

dp = <7exp[ — n(x— /^)2/2(7'^]rf^exp(--n/S2/2o'2)/S**-'*d/S, 

where C7 is a constant. Since this is of the form 

dp = (f>i{x)dx.<p2{S)dSf 

the distributions of x and S are independent. The forms of (f>i(x) 
and (f> 2 {S) show that x is normally distributed, with mean fi and 
variance cr^lUy and that is a Gamma variate with para- 

meter ^(n— 1). 

Another proof of the independence of x and will be given in §77. 

12. Show that the rth moment of a distribution about 

a; = 0 is 

Z(Z-f l)...a-l-r-l) 

(Z-l-m) (Z-f-m-f 1) ... (Z-f-m + r— 1)’ 

13. Show that, if r is less than m, the rth moment of a 
distribution about a; = 0 is 

Z(Z-H)...(Z-fr--l) 

(m — l)(m — 2) ... (m — r)’ 

14. Defining the harmonic mean (h.m.) of a variate x as. the 

reciprocal of the expected value of l/x show that, if Z> 1, the h.m. 
of a y(Z) variate is.Z — 1, that of a y^|(Z, m) variate is (Z— l)/(Z + m— 1), 
and that of a variate is (Z— l)/m, m being positive. 

• Fisher. 1926, 1, pp. 92-3. 
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CHAPTER IX 


CHI-SQUARE AND SOME APPLICATIONS 

74. Chi-square and its distribution 

We have already seen (§68, Theorem IV) that if a;<(i = l,2,...,n) 
are n independent variates normally distributed about a common 
mean zero with standard deviations and 

( 1 ) 

i 

then ix* is a Gamma variate with parameter Jn. The distribution 
of is therefore 

( 2 ) 

which expresses the probability that the value of found from a 
random sample will fall in the interval dx^* This distribution is 
often referred to as the x^ distribution, and a variate conforming to 
it is said to be distributed like x^» Since x^ is twice a y(in) variate, 
its mean value is n and its modal value n — 2, as proved in §67. 
Similarly, the variance of x^ is 2n. 

The distribution (2) was discovered by Helmert in 1876, and 
rediscovered independently in 1900 by Karl Pearson, who devised 
by means of it the x* test of ‘goodness of fit* which will soon be 
considered. We must first, however, examine the effect of one or 
more linear relations between, the variates and, to make this 
clearer, we shall digress briefly to remind the reader of the pro- 
perties of an orthogonal linear transformation. 

75. Orthogonal linear transformation 

Let the n variates be subjected to the linear transformation 

^ « ^c^jXf (i,j == l,...ynr). (3) 

If the constant coefficients are such that 


( 4 ) 
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the transformation is said to be orthogonal. In this case the coeffi- 
cients satisfy the relations 

= i = (S) 

and =* 0 = 0' + ^)* (®) 

These relations may be expressed verbally by saying that, in the 
determinant \ \ of the coefficients, the sum of the squares of the 

elements in any row or in any column is equal to unity, while the 
sum of the products of corresponding elements in any two rows or 
in any two columns is equal to zero. The determinant of the coeffi- 
cients is equal to ± 1; and consequently the Jacobian of the g’s 
with respect to the x*a is also ± 1. By changing the sign of one of 
the 5*8, if necessary, we may ensure that the value is 4- 1 ; and we 
shall assume in all cases that this has been done. It follows that 


3(^1.. ...U“ 


(7) 


It is easy to prove that, if the x’s are statistically independent 
variates, normally distributed about zero with unit S.D., so are 
the 5’s. For, the probability that simultaneously the n values of 
the variates will fall in the respective intervals dx^ is the product 
of the probabilities for the individual variates, and is therefore 
given by 

dP = {27Ty^^exjp{ — l^xl)dxidx2*..dx^. 

i 

The probability density for the joint distribution of the variates 
is therefore 

(2;r)-*» exp ( - i S x|) = (2;r)-*" exp ( - i 2 C?). 

i i 


But the ‘volume’ of the element bounded by the n pairs of hyper- 
surfaces 5<, + is 


3(^1 L) 
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so that the probability that the g’s will fall simultaneously in their 
respective interva 3 is expressible as 

dP = (2: exp ( - Ul) ^£1 • • • exp ( - i^) 

From the form of this expression it follows that the ^’s are statisti- 
cally independent, and are normally distributed about zero with 
unit s.D. 

The transformation (3) corresponds to a rotation of rectangular 
axes in Euclidean space of n dimensions. In the choice of the 
coefficients c^y there is a degree of arbitrariness. For instance the 
coefficients for may be chosen first, subject only to the condition 

1 , 

i 

which ensures that the constants are the components of a unit 
vector. When has been determined ^2 n^ay be chosen in an infinity 
of ways, subject only to the conditions 

= 1 , 

i i 

which express that Cjy are components of a unit vector, orthogonal 
to the vector whose components are At each step there is freedom 
of choice of an axis orthogonal to those already chosen; and only 
in the case of the nth axis is there no freedom of choice. 

76. Linear constraints. Degrees of freedom 

As above let the variates be normally distributed about zero 
as mean, with unit s.D. To be^in with we assume that they are 
functionally independent; but presently we shall impose on them 
the linear restriction 


== 0 , ( 8 ) 

an equation which may be divided throughout by the constant 
necessary to make and we assume that this has been done. 

i 

If the variates are independent, we may obtain by an orthogonal 
linear transformation a set of n statistically independent variates 
each of which is normally distributed about zero as mean, with 
unit S.D.; and this may be done so that is identical with the first 
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member of (8). Now let the condition (8) be imposed on the x’s. 
This is equivalent to putting Ci = 0; and, since the variates • • • > Sn 
are statistically independent of ^j, their distributions are unaltered 
by the condition so imposed. Consequently 

= = i: 22.,, 

i 1 a i-a 

where the are Gamma variates, each with parameter Thus 
is the sum of 7i~ 1 independent y(i) variates, and is therefore itself 
a Gamma variate with parameter 1). The distribution of 
is therefore obtained from (2) by putting n — 1 in place of n. The 
condition (8) has reduced the number of independent variates by 
one. The number of independent variates is usually called the 
number oi degrees of freedom (D.P.),or briefly the number of freedoms. 
The term is borrowed from geometry and mechanics, where the 
position of a point or of a body is specified by a number of functionally 
independent variables called coordinates. Each independent co- 
ordinate corresponds to one degree of freedom of movement. Any 
constraint on the body reduces the number of degrees of freedom. 
For this reason a linear relation between the variates x^ is called a 
linear constraint. We shall assume that only linear constraints are 
involved. 

An appeal to geometry throws light on the above reasoning. If 
the variables are regarded as rectangular Cartesian coordinates 
of the current point P in Euclidean space of n dimtnsions, the 
square of the distance of P from the origin 0 ia 

OP2 ^ 

and, in terms of the alternative set of coordinates relative to 
rectangular axes through the same origin, 

When the variables are connected by the equation (8), the point P 
is constrained to lie on a hyperplane through the origin, determined 
by this equation. In terms of the ^’s the equation of the hyperplane 
is simply 


£i-o. 
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and for a j)oint P on this hyperplane we have 

1 % 

as above. Thus the linear constraint (8) restricts the freedom of 
‘movement of P to the hypcrplane, in which there are only n— 1 
independent coordinates, then said to correspond to n — 1 D.F. 

That each linear constraint reduces the number of freedoms by 
unity may be shown in a similar manner. Thus if there is a second 
linear constraint 

it is expressible in terms of the in the form 

n 

in which = 1- We obtain ^ of squares ofn — 1 

s 

standard normal variates (i = 2, which are now connected 

by the linear constraint (10). Proceeding as before by an appropriate 
orthogonal transformation 

Vi 

in which 17, is identical with the first member of (10), we may 
express as, 

ivi 

<-s 

and is thus a Gamma variate with parameter |(n — 2), so that 
X* corresponds to n — 2 d.f. The afgument holds for any number of 
linear constraints. Following Yule and Kendall* we shall denote 
the number of freedoms by v. Then, if the n variates are subject to 
m linear constraints, 

F = n— m. (11) 

In place of (2) we thus have for the distribution of corresponding 

to F Djr., 

( 12 ) 

* 1937. 1. p. 415. 
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77] Distribution of Variance 

Since Lb a y{\v) variate, the mean value of ^ is v and its modal 
value is v—2. The probability that the value of ^ from a random 
sample will not exceed a fixed value is obtained by integrating 
(12) with respect to from 0 to xJ* Similarly, the probability that 
will exceed ^ is the integral* of (12) with respect to from Xo 
to qo. The accompanying Table 3 gives the values of xh values 
of y from 1 to 30, and for various fixed values of the probability P 
of exceeding xl- In other' words, P is the probability that in random 
sam|)ling the value of x^ shown in the body of the table will be 
exceeded. 

Example. Show that, for 2 D.F., the probability P of a value of greater 
than ( ■“ ixl)* hence that xi — ^ 

77. Distribution of the *sum of squares’ for a random sample 
from a normal population 

Consider a random sample of n independent values from a 
normal population of variance If as usual x denotes the mean 
of the sample, the sum of squares of the deviations from the mean is 

(13) 

i 

This is the ‘sura of squares’ to be considered; and we shall prove 
that nS^jer^ is distributed like x^ n — 1 d.f. We know that 

— + == (14) 

i i 

If then we introduce an orthogonal linear transformation (3) of the 
variables suchf that = x^n, we have by (14) 

ns^ = = i: = i: a 

i <-2 

and therefore nS^ja^ == 2 (I^) 

i-8 

Now, with origin at the mean of the population, the x’s are normally 
distributed about zero as mean with s.d. (j, and so also are the ^’s. 

* For a method evaluating this integral see Fisher, 1935, 1, pp. 356-7. 
See also Ex. ix, 9. 
t Gf. Sawkina* 1940, 3, p. 226. 
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Consequently \nS^I<T^ is a Gamma variate with parameter ^(n — 1), 
and its distribution is 


dP 


I { -nS^ 


r(i(n-l))\2(rV 


/ — n<S*\ , (nS\ 


In other words nS^jcr^ is distributed like with n — 1 d.f. 

It is convenient to express the result in terms of the statistic 
of § 54, which is an unbiased estimate of the variance of the popula- 
tion. Thus on / a a 

nS^ =r (n— = V8^, 

and the distribution of is therefore 


the estimate s* being based on v d.f. 

The coefficient of ds^ in (16) is the probability density in the 
distribution of s*. Suppose that the population variance is unknown, 
and that we enquire what value of it would make the probability 
density of a maximum for the given value of 8^, This is obtained 
by equating to zero its derivative with respect to The procedure 
leads to cr* == s*. For this reason 8^ is called the optimum value of 
the population variance corresponding to the given sample; and the 
method of obtaining it is called the method of maximum likelihood. 
It is due to R. A. Fisher. 

In the above argument is independent of ^ 2 » there- 

fore, in virtue of (15), x is independent of S^, Thus the sampling 
distributions of the mean and the variance are independent. And, 
since ^jer is a standard normal v^iriate, so is Xyjnicr, It follows that 
X is normally distributed with the same mean as the population, 
and with variance cr^/n. 


78. Nature of the chi-square test. An illustration 
The test is a means of judging the credibility of an hypothesis 

concerning the population (or populations) from which the values 
of the sample (or samples) are drawn. The hypothesis to be tested 
must be of such a nature that we can determine, from the sample 
values of the variate, the corresponding value of a certain statistic, 
which is distributed like ^ for a known number of freedoms. The 
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Tablb 3. Valuea of x* with •probability P of 
being exceeded in random sampling 

V s= number of degrees of freedom 


X 

0-99 

0-96 

0*50 

0*30 

0*20 

010 

0*05 

0*01 

1 

00002 

0*004 

0*46 

1*07 

1*64 

2*71 

3 84 

6*64 

2 

0020 

0*103 

1*39 

241 

3*22 

4*60 

5*99 

9*21 

3 

0 1 15 

0*35 

2*37 

3*66 

4*64 

6 25 

7-82 

11 34 

4 

0 30 

0 71 

3 36 

4*88 

6 99 

7*78 

9*49 

13 28 

6 

0 55 

1*14 

4*35 

6*06 

7*29 

9*24 

11*07 

16 09 

6 

0*87 

1 84 

6*35 

7 23 

8*66 

10*64 

12*69 

16*81 

7 

1 24 

2*17 

6*35 

8*38 

9*80 

12*02 

14*07 

18*48 

8 

1*65 

2 73 

7 34 

9*52 

11 03 

13 36 

16 51 

20*09 

9 

209 

3 32 

8*34 

10*66 

12*24 

14*68 

16-92 

21-67 

10 

2*56 

3*94 

9*34 

11-78 

13*44 

15 99 

18*31 

23 21 

11 

3 05 

4*68 

10 34 

12 90 

14*63 

17*28 

19 68 

24*72 

12 

3 67 

6*23 

11 34 

14 01 

16*81 

18 55 

21*03 

26*22 

13 

4 11 

6*89 

12 34 

16 12 

16*98 

19*81 

22 36 

27*69 

14 

4*66 

6*57 

13*34 

16 22 

18*16 

21*06 

23 68 

29*14 

16 

6*23 

7*26 

14 34 

17 32 

19 31 

22*31 

25 00 

30*68 

16 

6*81 

7*96 

15*34 

18*42 

20*46 

23*64 

26*30 

32*00 

17 

6 41 

8*67 

16*34 

19 61 

21 62 

24*77 

27*69 

33*41 

18 

702 

9*39 

17*34 

20 60 

22*76 

25*99 

28*87 

34*80 

19 

7*63 

10*12 

18*34 

21*69 

23*90 

27*20 

30*14 

36*19 

20 

8*26 

10*86 

19*34 

22*78 

25*04 

28*41 

31 41 

37*67 

21 

8-90 

11*69 

20*34 

23 86 

26*17 

29*62 

32 67 

38*93 

22 

9*54 

12*34 

21*34 

24*94 

27*30 

30 81 

33 92 

40*29 

23 

10*20 

13*09 

22*34 

26*02 

28*43 

32 01 

35 17 

41 64 

24 

10*86 

13 85 

23*34 

27*10 

29 55 

33*20 

30 42 

42 98 

25 

11*52 

14 01 

24*34 

28*17 

30*68 

34 38 

37 65 

44 31 

28 

12*20 

16*38 

25 34 

29*25 

31*80 

35 56 

38*88 

45 64 

27 

12 88 

10 16 

26 34 

30 32 

32 91 

36*74 

40 11 

46 96 

28 

13 66 

16*93 

27*34 

31 39 

34 03 

37*92 

41 34 

48 28 

29 

14 26 

17*71 

28*34 

32*46 

3 .) 14 

39*09 

42-66 

49*59 

30 

14*95 

18*49 

29*34 

33 63 

36*26 

40*20 

43 * 7 1 

50*89 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers, 


table then tells us the probability P that, in random sampling, a 
value of this statistic will occur greater than the value actually 
obtained. If this probability is very small, we regard the value 
obtained as significantly large, and conclude that the hypothesis is 
probably incorrect. 

Two conventional values of P are employed in deciding signi- 
ficance, viz. 0*05 and 0*01. These determine the 5 % and the 1 % 
levels of significance respectively. If the value of P obtained is 
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greater than 0-05 we infer that, in more than 6 % of random samples 
from the population in question, the value of ;i^* obtained would be 
greater than that actually found, which is therefore regarded as not 
significantly large. If, however, P is less than 0-05 the value found 
is regarded as significant at that level. Similar remarks apply to the 
1 % level of significance. Values which are significant at the 1 % 
level of probability are said to be highly significant, and are some- 
times distinguished by a double asterisk. Significance at the 5 % 
level is then denoted by a single asterisk. 

The value of obtained from the sample may also be signi- 
ficantly small. A glance at the probability curv^e of x^ will help to 
make this point clear. When the number v of degrees of freedom is 



greater than 2, this curve has the form indicated in the diagram, 
except that when — 3 it touches the y-axis at the origin, and when 
I' «■ 4 it touches neither axis at that point. The ordinate for any 
value of x^ is the probability density for that value; and this is small 
for smaU values of x^» When the value of P found from the table is 
greater than 0*95, the probability of obtaining a smaller value of x^ 
is less than 6 %, and the sample value must be regarded as signi- 
ficantly small. Similarly, when P is greater than 0*99 the probability 
of a smaller value of x* is less than 1 %, and the smallness of the 
sample value is highly significant. 

As an illustration of the x* test we may consider the following. 
Let be a random sample of n values from a normal population 
of variance <7*, x the sample mean and 
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Then we know that is distributed like n-’l d.f. 

If then we are given the sample values but have no certain know- 

ledge of the population from which they were drawn, we may test 
the hypothesis that they came from a normal population of s.d. cr. 
For, when has been found from the sample, our hypothesis gives 
as the sample value of the table then enables us to 

decide whether this value is significant or not, that is to say whether 
our hypothesis is improbable or not. 

Example, A random sample of 12 values gave an unbiased estimate «* of 
the population variance equal to 10*62 mm.* May the sample be reasonably 
regarded as from a normal population with variance 7 mm.* ? 

Here = 11 x 10*62 = 116*82, and, according to our hypothesis con- 
cerning the population, 

116*82/7= 16*7. 

The number of d.f. is 11. From the table we see that the probability, P, that 
the value of x* such samples will exceed 16*7 is greater than 0*10. The 
value found is therefore not significant, and the test provides no evidence > 
against the hypothesis of a normal population with variance 7. 

79* Test of goodness of flit 

We shall now consider the test of goodness of fit devised by 
Karl Pearson. As in Chapter vn let the population be one whose 
members may be separated into a number k of classes, and let 
be the relative frequency of the tth class. In the choice of a simple 
sample of n members from this population, the probability at any 
drawing that the individual selected will belong to the ith class is^<. 
The frequency of this class in sampling has a mean value 
given by 

nti =» npi, (17) 

and the distribution of/< is the binomial. For large values of n the 
binomial distribution approximates to normal; and the sampling 
distribution of any class frequency is therefore approximately 
normal for large samples. 

Next suppose that we are given a set of class frequencies /<, 
whose sum is n, without any information about the source of the 
values. We may wish to enquire whether they may reasonably be 
regarded as those of a simple sample from a certain hypothetical 
population. The population is hypothetical in the sense that it is 
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determined by an hypothesis, which must be such as to enable us to 

calculate the expected values of the class frequencies in random 

sampling from it. We shall prove that, for large samples, the variate 

2 {/< — conforms approximately to the ^ distribution, and 
i 

shall show how the number of degrees of freedom is determined. 
The value of obtained from the sample enables us to test the 
credibility of our hypothesis. 

Since, for large samples, the class frequency /< is distributed 
approximately normally about as mean, the variate 

(IS) 

is distributed approximately normally about zero as mean. We 
require the variance (t\ of when the class frequencies are entirely 
independent\ and, to obtain this, we must take account* of the 
variation in their sum n. Thus 


n-S/i, (19) 

and the size n of the sample varies about a mean n with s.n. cr^. 
Since the frequencies are supposed independent, the variance of 


n is given by 




( 20 ) 


Now, in virtue of § 47 (8), the variance crj for samples of varying 

( 21 ) 


size IS 


<r\ = npiqi+plor^„. 


or, since 'ZPt = 1» 


Summing for all the classes we have, by (20), 

<Tl = n'Zpiqi+ai:E.Pl 

i i 

<v 

o|l2(Pi-pi) = »z:p<g<. 

Hence cr* = n, so that (21) is equivalent to 

<rf = {Piqi+Pl)n = PiTi = m<. 

The quantity defined by 

is a sum of squares of standard approximately normal deviates, 
and is therefore distributed approximately like x®* 

♦ Cf. Fisher, 1922, 1, p. 88. 


( 22 ) 

(23) 
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For samples of fixed size, the number of d.f. is clearly less than 
the number k of classes. For the sum of the class frequencies is 
constant, and this corresponds to a linear constraint on the variates 
x^. Further, to determine the theoretical class frequencies it 
is sometimes necessary to estimate parameters of the population 
from the data of the sample. For instance, in testing the hypothesis 
of a normal population, it may be necessary to estimate the mean 
and the variance of the population from the sample values. Each 
estimate of a parameter obtained in this manner corresponds to the 
introduction of a linear constraint. For the moments about the 
mean are linear, or approximately Hnear, functions of the class 
frequencies; and, in equating such a function to a parameter, we 
introduce an approximately linear constraint on the variates x^. 
In calculating the number of freedoms of each constraint intro- 
duced in this manner must be recognized. 

The approximation of the binomial distribution to normal, when 
n is large, does not hold for very small values of p or g. Hence the 
above argument is not valid if one of the class frequencies is small; 
for that would make the corresponding relative frequency fjn very 
small, and therefore the probability p for that class in sampling 
from the population also very small. Classes of small frequency 
may be treated by combining two or more of them to form a class 
sufficiently large. 


80. Numerical examples 


Example 1. Can the wages of 1,000 employees, given in Ex. I, 3 (p. 17), be 
regarded as a random sample from a normal population? 

Our hypothesis is that the population is normal. Since the sample is large 
its mean and its variance are taken as estimates of those of the population. 
In Ex. Ill, 6(p. 60) the class frequencies per thousand of the normal population 
are given. On account of the smallness of the extreme frequencies we combine 
the first two classes, and also the last two, leaving 13 classes. Since the sum 
of the class frequencies is constant, and two of the parameters were estimated 
from the sample, the number of degrees of freedom is 10. The value of ^ 
from the sample is 


. 6* 10* ^ 14* (3-2)* 

|/* ~ — . 1 ^ Q q. q. 

^ 18 25 79 ^ 18 


19-84. 


From the table we see that, for 10 d.f., this value of x* 's significant at the 
6% level. We conolude that the assumption of a normal population is 
probably inoorrect. 
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Example 2. From the adult male populations of seven large cities, random 
samples of the sizes indicated below were taken, and tlie numbers of married 
and single men recorded. Do the data indicate any signiiicant variation 
among the cities in the tendency of men to marry T 


City... 

A 

B 

C 

D 

£ 

F 

G 

Total 

Married 

133 

164 

155 

106 

153 

123 

146 

980 

Single 

36 

57 

40 

37 

55 

33 

36 

294 

Total 

169 

221 

195 

143 

208 

156 

182 

1274 


We test the hypothesis that there is no significant variation in the ten- 
dency mentioned. Then the men from each city may be regarded as a simple 
sample from a piopulation in which the ratio of married men to single is 
approximately the same as in the column of totals. This ratio is 10:3. The 
theoretical frequencies for any city are then obtained by dividing the total 
for that city into two parts in this ratio. These frequencies are: 


City... 

A 

B 

C 

D 

E 

F 

Q 

Total 

Married 

130 

170 

160 

110 

160 

120 

140 

980 

Single 

39 

61 

45 

33 

48 

36 

42 

294 

Total 

169 

221 

195 

143 

208 

156 

182 

1274 


From these figures we have 


X* 


3* ^ 6* ^ 

m Iso 42 


= 6-34. 


To find the number of freedoms we observe that the sum of the frequencies 
of married and single men from any city is constant, being equal to the size 
of the sample from that city. This reduces the number of independent fre- 
quencies to 7, And further, a parameter of the population wm estimated from 
the sample, namely, the ratio of the numbers of mairiod and single men. 
Consequently v = 6, For this number of freedoms the probability of obtaining 
a larger value of x* than 6*34 is about 0-60. The value is therefore not signi- 
ficant, and the test furnishes no evidence against the hypothesis. 


Example 3. May the data in Ex. Ill, 7 (p. 61) be regarded as those of a 
random sample from a Poissonian distribution? 

The mean, m = 1*2, was estimated from the sample, and from it the 
theoretical frequencies per thousand of the Poissonian distribution were 
calculated in the example referred to. To apply the x* test we combine the 
last three classes. Then 


^(3:8)* (4»4)» 

301-2'^ 36P4'^“'*^ 7-6 


3*5. 


Here the number of classes is 6; but the total frequency is constant, and m 
was estimated from the sample, so that r = 4. With 4 d.f. the probability 
that X* ^111 exceed 3*5 is nearly 0*50. The value is therefore not at all signi- 
ficant, and the assumption of a Poissonian population is not 
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81. Additive property of chi-square 

Theorem I. If the independenfU variates x and y conform to the ‘)^ 
distribution^ with and d.f. respectively, then x-\-y is distributed 
like yf tvith + p^ d.f. 

For and Jy are independent Gamma variates with parameters 
^Pi and ^p^ respectively. Therefore, by § 68, Theorem II, \(x-h-y) is 
a Gamma variate with parameter l{Pi-hPt)» Consequently x+y 
conforms to the distribution with p^ -h p^ d.f. 

Similarly, corresponding to § 68, Theorem III, we may state 

Theorem II. If the sum of two independent positive variates is 
distributed like x* with + 1^2 ® distributed like 

‘}^ with p ^ D.F., then the other is distributed like x^ with p ^ d.f. 

In the same way Theorem V of § 69 and Theorem VIII of § 73 
may be expressed as 

Theorem III. If the independent variates x and y are distributed 
like X* with and p, d.f. respectively, then xl(x + y) is a Beta variate 
of the first kind with parameters ^Pj and ^P 2 , while x/y is a Beta variate 
of the second kind with the same parameters. 

82. Samples from an uncorrelated bivariate normal population. 

Distribution of the correlation coefficient 

The distribution of the coefficient of correlation, r, in samples 
from a normally correlated bivariate population was given by 
Fisher* in 1916. In the particular case of an uncorrelated popula- 
tion (p = 0) the distribution of r is very simple. Let and cr, be 
the s.d.’s of X and y in the population, and let the variates be mea- 
sured from their means. Consider a random sample of n pairs of 
values X|, y^, from such a population. The variances S\, S\oix,y 
in the sample are given by 

= 2 - x)\ n^i = S (y< - y )*, 

i i 

and the correlation, r, in the sample is 

(yi-y) 

nS^S^ 

• Fisher, 1915, 2. A simple and excellent proof of a different character 
has recently been published by Sawkins (1944, 1). The proof for p = 0 given 
in this section is based on that of Sawkins. 


WMt 


u 
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Owing to the nature of the sampling the n values y, are indepen- 
dent. Let these be subjected to an orthogonal transformation 
yielding n variates 7 ^, the first of which may be taken as 

since the sum of squares of the coefficients of the yi is unity. Then 
''L'it = Sy* = S iVi-yY + ny^ = nSl+Tjl, 

1 1 

SO that n/S| = S rjf, (i) 

2 

and this sum, divided by o*!, is distributed like with n — 1 d.f. 
Further, from the above definition of r, 

^nrSi='£.{Xt-x) (y, - y)l^n Sj = 2 (a:, - x) yj^n S^. 

We may take this sum for the second variate, 7 / 2 » 
the squares of the coefficients of the is unity, the orthogonal 
condition is satisfied by the coefficients of and 9 / 2 * 
variables x and y are independent. Since 9/| = it follows from 


(i) that — n{l--r^) S\, Further, the are independent 

3 

standard normal variates. Thus nr^S\l(x\ and n(l — r^) Sl/o*! are 
distributed independently like with 1 and n — 2 d.f. re- 
spectively; or we may express it by saying that the former is a 
^ and the latter a It follows that 

nr^Sl nr^Sycrl 




Xi 


nSl nr*5|/crl + n(l -r*) Sl/al xl + xW 
and therefore, by §81, Theorem^ III, r* is a variate, 

with distribution 


dp = 




(24) 


We may thus state the important 
Theorem IV. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, r* is a Beta variate of the 
first kind with parameiers ^ and i(n — 2). 

The expected value of r* for such samples is therefore l/(n— 1), 
and the s.e. of r is l/^(n— 1). And since r ranges from — 1 to 1, the 
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opposite values 
of r is 


±r giving the same value of r*, the distribution 




(25) 


In considering the linear regression of 1 / on a; we proved the ^ 
relation (cf. §27 (31)) 

2: { y , - 2/)* = 2 { y , - + 2 - y )\ (26) 

where Y is the estimate of y from the regression equation. In the 
present notation this corresponds to the identity 

nSl = n{\-r^)Sl^-nr^S% (27) 

We have just seen that, for samples from the above population, 
these three sums are independent. Thus the sum of squares of 
deviations from the line of regression is distributed independently 
of the sum of squares of deviations due to regression in the sample. 
When divided by (t\ the three sums in (27) are distributed like 
with 71—1, 71 — 2 and 1 d.f. respectively, illustrating the additive 
property of stated in § 81. 


83. Distribution of regression coefficients and correlation ratios 

The distribution of the linear regression coefficient, b, of y on a; 
follows from the above results. For 


so that 


h = rS^IS^ 

nr^SVal 

crl “ nSyal * 


As we have just seen, the numerator and denominator of the second 
member are distributed independently Uke with 1 and ti — 1 d.f. 

respectively. Consequently, by Theorem III, the quotient is a 
L I 1 )) variate, and therefore b^aljal is a variate of the same 
type. Hence we may state 

Theorem V. In random samples of n pairs of values from an 
uncorrelated bivariate normal population, in which the standard 
deviations of the variables are cr^ and the linear regression coefficient 


ra-2 
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b of y on X is such that b^al/al is a Beta variate of the second kind with 
'parameters ^ and — 1). 

Since the mean value of a m) variate is Z/(m — 1 ), the expected 
value of b^ in sampling is (T\l{n — Z)a\. And, since the range of 6 is 
from — (X) to +00, the distribution of b is 


CTicr^-^db 


(28) 


The distribution of the correlation ratio^ rj, of y on x may be 
determined from the corresponding resolution of the sum of squares. 
With the notation of § 34 let the subscript i distinguish the particular 
array of i/’s, while j indicates position in that array, denoting the 
mean of the y’s in the ith vertical array, and the frequency in 
that array. Then, in virtue of § 34, we have the resolution 

(29) 

i i i i i 

the various sums being equal to the corresponding terms of the 
identity 

n8\ ^ n(l ,Sf| + 717/2^1. (30) 


Now the first member of (29), divided by (r|, is distributed like 

with n— 1 D.F. Further, since 2 denotes summation over the values 

i 

of any particular array, 2(y<y — y<)V ^2 is distributed like with 

— 1 D.F. Summing for all the h arrays we see that the first sum in 
the second member of (29), divided by (r|, is a ^ with S (n.<- 1) or 

i 

n — A D.F. And, because the means of the arrays are distributed 
independently of their variances, it follows from §81, Theorem II, 
that the last sum in (29), divided by cr|, is a with A — 1 d.f.* We 
may therefore write 

«i nri^S\lcr\ 


^ See also Ex. 7 at the end of this chapter. 
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in which the subscript indicates the number of d.f. for the 
Thus, in virtue of §81, Theorem III, is a 1), 

variate, and we may state 

Theorem VI. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, the square of the correlation 
ratio of y on X is a Beta variate of the first kind with parameters • 
\[h— 1) and i(n — A), where h is the number of arrays of y's. 

The distribution of is thus 


and its mean value,* by § 69 (23), is 

h-1 


E{v^) = 


n-V 


(31) 


(32) 
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EXAMPLES IX 

1. Apply the show that the deaths of centenarians 

recorded in the example of § 18 may reasonably be regarded as a 
random sample from a Poissonian population. 

2. A certain hypothesis was tested by two similar experiments, 

which gave x* =* for v = 9 and x^ == for v = 1 1. Show that 

the two experiments combined give less reason for confidence in the 
hypothesis than either experiment alone. 

• Cf. Fisher, 1922, 2, pp. 604-5. 
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3, Provo by mathematical induction that the sum of the squares 
of n independent standard normal variates conforms to the 
distribution (2) of § 74, (See Fisher, 1935, 1, pp. 353-4.) 

4. Geometrical proof of the x^ distribution. The following geo- 
metrical proof, that the sum of the squares of n independent stan- 
'dard normal variates has a distribution given by (2) of §74, is 
due to Fisher (1935, 1, p. 354). As in §74 the probability that the 
values of the variates will fall simultaneously in the respective 
intervals dx^ is 

dp = {2Ti)-^”^exp(-lx^)dXidx^...dx„, (i) 

where xf —IL 

t 

Let us regard the x^ as coordinates of tlie current point P in Euclidean 
space of n dimensions. Then dp is the probability that P will fall 
in the element of volume dx^dx^...dx^. The coefTicient of this 
element of volume in (i) is therefore the probability density for this 
space,. and it is proportional to exp ( — Ix^)- Now we can express, in 
terms of x dxy an element of volume in which the value of x 
may be regarded as constant. For, since x^ square of the 

distance of P from the origin, x is constant over the surface of a 
hypersphere with radius x centre at the origin. The volume 
enclosed by this hypersphere is proportional to and the element 
of volume between this and the adjacent hypersphero of radius 
is proportional to d(x^), that is to the above 

value of the probability density, we see that the probability that 
P will fall in the region bounded by the two hy[)ers[)hercs is pro- 
portional to ;\;^~*exp ( — is the probability that the 

value of X from the random sample will fall in the interval dx^ The 
above probability is clearly proportional to 

( J;^,2)i(»-2)exp ( - ix^) d( Ix^), 

and, since x^ must lie between 0 and +oo, the constant factor is 
\ir{\n)y so that the integral of the probability throughout this 
range may be unity. Since x^ in the interval dx^ when x in 
the interval dx, it follows that the distribution of x^ is given by 

§74(2). 
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For still another proof of the distribution see Plummer, 1940, 
1, p. 246. 

5, Homogeneity of several estimates of the population variance. 
Suppose that k independent samples have furnished estimates s\ of 
the population variance, based on (i = d.f. Are these 

estimates such that the samples may be regarded as drawn from 
the same population? In other words, are the estimates homo- 
geneous% 

On the hypothesis that the samples are from the same population, 
an unbiased estimate s’^ of the variance of the population is 

^ i i 


since the expected value of this quantity is the variance of the 
population, in virtue of §54(8). Bartlett* has shown that the 
statistic 


where 


C 





is distributed approximately as with k—\ d.f. The value of 
calculated from the data will tell whether the hypothesis of homo- 
geneity is reasonable. 

6 . Show that the estimates 3-8, 4-4, 81, 6*1, and 9*4 of the 
population variance, based on 5, 8, 6, 7 and 4 d.f. respectively, may 
be regarded as homogeneous according to the test of Ex. 5 (i = 5, 

« 1-45). 

7. That the sum of §83 is distributed like ^ 

i 

with A — 1 D.F. may also be shown as follows. Since y is the weighted 
mean of the we have 

= Y.ni{yi-yY + n{y-n')*. (ii) 

Now yi-'P* is distributed normally about zero as mean with 
variance cr|/n^. Consequently — /z')Vcr| is a with 1 d.f.; and, 

♦ Proc, Roy, Soc, A, vol. 160, 1937, pp. 268-82. See also Neyroan emd 
Pearson, 1931, 3. 
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on summing for all the arrays, we see that the first member of (ii) 
divided by crj is a with h d.f. Similarly, y— /i' is distributed 
normally about z^ero as mean with variance cr|/n; and the last term 
in (ii), divided by crj, is therefore a with 1 d.f. Theorem II of § 81 
then shows that the first sum on the right of (ii), divided by <r|, is 
a X* with A — 1 d.f., as stated. 

8. Taking the correlation ratio as positive, deduce from § 83 (31) 
that the mean value of the correlation ratio of y on x, for samples 
from an uncorrelated bivariate normal population, is 

r(jA)r(j(n-i))/r(i(A-i))r(in). 

9. Integrating by parts show that, if r is an even positive integer. 





1 

(r~l)! 




and, by continuing the process, show that the value of the integral is 


e-fi 



2!^3r 



Hence show that the probability, in random sampling, that the 
value of ^ with v (even) d.f. will exceed is obtained from this 
expression by putting = ^Xo r = J(f-2). {C£, Fisher, 1935, 
1, pp. 358-7.) 

10. The variables x, y are normally correlated with coefficient p. 
Show that u and v, defined by 

tt = xlcTi + y/cTg, V = x/cTi - y/CTji, 

are independent normal variates with variances 2(l -f p) and 2(1 ~p) 
respectively; and that, if is the correlation between u and v in 
random samples of n pairs from the bivariate population, ^ is a 
flx(h variate. 



CHAPTER X 


FURTHER TESTS OF SIGNIFICANCE. 
SMALL SAMPLES 


84. Small samples 

The use of the standard errors of statistics in the tests of Chapters 
VI and VII depends upon the fact that, in the case of large samples, 
the sampling distributions of many statistics are a[>proximately 
normal, or at any rate unimodal with the property that a value of 
the statistic, deviating from its mean by more than two or three 
times its s.E., is very unlikely. For small samples, however, the 
distributions of statistics are often far from normal. Moreover, 
the estimate of a parameter of the population made from a small 
sample is not at all reliable. For these reasons the use of standard 
errors in connection with such samples is very limited. The chief 
concern of the theory of small samples is with the distributions of 
various statistics, and the applications of tests of significance based 
upon these distributions. In each application we test a hypothesis 
concerning the source of the sample. This hypothesis may be, for 
instance, that a certain parameter of the population has a specified 
value, or that two given samples were drawn from the same popula- 
tion. In each instance the test employed enables us to form a con- 
clusion based on considerations of probability. The nature of the 
tests is illustrated by the x* test of §78, which is applicable to 
samples of any size. But, as already indicated, the test of goodness 
of fit considered in §79 can be apphed only to large samples. 

We remind the reader of two important results obtained earlier. 
First, in simple sampling from a population with mean fi and 
variance cr*, the distribution of the mean x of the sample of n 
members has /i for its mean and a^Jn for its variance (see §50). 
This result holds whether the sample is large or small, and whether 
the population is normal or not. In the case of a normal population 
the distribution of ^ is also normal (see §§ 22 or 82). Secondly, the 
statistic s* defined by n 

(n^l)s*- 2 


( 1 ) 
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is an unbiased estimate of the population variance o’*, in the sense 
that £(5*) = tr* (cf. §54 (8)). This estimate of cr* is said to be based 
on n — 1 D.F., since the variates — x are not independent, being 
connected by the linear relation = The number of 

independent variates is thus only n— 1; and this agrees with the 
result of § 77, that (n — I ) s^jcr^ is distributed like with n — 1 d.f. 

The following discussion applies to samples of any size. It is 
assumed, however, that the populations from which the samples 
are drawn are normal. The results obtained are therefore strictly 
true only in this case. But they are approximately true, and may be 
usefully applied, in most cases in which the departure of the popula- 
tion from normality is not very marked. 


"Student’s* Distribution 


85. Tlie statistic t and its distribution 


We have seen that, in simple sampling from a normal population 
of mean fi and variance the deviation is a normal variate, 
with mean zero and s.d. (rlyjn. The quotient of x — by crj^n is 
therefore distributed normally with unit s.d. If, however, in place 
of the constant cr we use the variable estimate s obtained fiom the 
samjjle, we have the statistic 


_ 9 


( 2 ) 


which is not normally distributed. The distribution of t was first 
found by W. S. Cosset, who '"wrote under the nom de phtme of 
‘Student’. It follows very simply from the theorems of §§67, 73 
and 77. For, from its definition, 

(x-n)^ 

where nS^ = S v is the number of d.f., n — 1, on which 

the estimate s^ is based. Hence 
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But, in virtue of §C7 (Theorem I) and §77, the numerator in the 
second member is a Gamma variate with parameter J, and the 
denominator a Gamma variate with parameter \v\ and these are 
distributed independently of each other (cf. §82 or Ex. VIII, 11). 
Consequently is a variate; and its distribution is 

obtained by substituting for x in §72(27), with and 

ra = Thus the probability that, in random sampling, the value 
of will fall in the interval is 








(3) 


By integrating with respect to from a fixed value to infinity,* 
we obtain the probability that the value of will exceed tl. The 
accompanying table contains extracts from a more complete one 
given by Fisher. In the body of the table are given the values of 
corresponding to certain fixed values of and P. Thus, for a specified 
number v of d.f., the value of P at the top of the column is the 
probability that, in random sampling, the numerical value of t in 
the body of the table will be exceeded. 

The range of is from 0 to oo, and its distribution is given by (3). 
The statistic t, however, ranges from — oo to -f oo, and its distribution 
is therefore 


the factor 2 disappearing, since the integral of dP over the whole 
range of variation of I must be unity. The distribution (4) is spoken 
of as the t distribution corresponding to f d.f. And from the above 
argument it is clear that any statistic whose range is from —oo to 
+ 00 , and which is such that Pjv is a variate, conforms to 

the t distribution for v d.f. This important result may also be 
expressed in the form of 

Theorem I. A statistic t conforms to the t distrihnfion /or f d.f. if its 
range is from — oo fo +oo, and Pjv is expressible as the quotient of two 
independent variates, which are distributed like 1 and v D.F. 

respectively. 


• For a method of evaluating thie integral see Fisher, 1935, 1, pp. SSS-CO. 
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Table of t 


[X 


Table 4. Values of mod. t vnlh a probability P 
of being exceeded in random aa tripling 

V =a number of degrees of freedom 


V 

0-60 

0-10 

0-05 

0-02 

0-01 

1 

1-000 

6-34 

12-71 

31-82 

63 66 

2 

0-816 

2 92 

4 30 

6 96 

9-92 

3 

0-765 

2 35 

3 18 

4 54 

6 84 

4 

0 741 

2 13 

2-78 

3 75 

4 60 

6 

0-727 

2 02 

2 57 

3 36 

4 03 

6 

0-718 

1 94 

2 45 

3 14 

3 71 

7 

0 711 

1-90 

2 36 

3 00 

3 50 

8 

0-706 

1-86 

2 31 

2!>0 

3 36 

9 

0-703 

1-83 

2-26 

2 82 

3 25 

10 

0-700 

1-81 

2-23 

2 76 

3 17 

11 

0-697 

1-80 

2-20 

2-72 

3-11 

12 

0 695 

1-78 

2-18 

2-68 

3 06 

13 

0-694 

1-77 

2-16 

2-65 

3-01 

14 

0-692 

1-76 

2-14 

2 62 

208 

16 

0 691 

1 76 

2 13 

2-60 

2-95 

16 

0-690 

1 75 

2 12 

2-58 

2 92 

17 

0-689 

1 74 

2 11 

2 67 

2-90 

18 

0-688 

1-73 

2-10 

2 65 

2-88 

19 

0-688 

1 73 

2-09 

2 54 

2-86 

20 

0 687 

1-72 

2-09 

2-63 

2-84 

21 

0-686 

1-72 

2-08 

2-52 

2-83 

22 

0686 

1-72 

2-07 

2-51 

2 82 

23 

0-685 

1-71 

2-07 

2-50 

2 81 

24 

0686 

1-71 

2-06 

2 49 

2-80 

25 

0-684 

1-71 

2 06 

2 48 

2-79 

26 

0-684 

1-71 

2-06 

2-48 

2-78 

27 

0684 

1-70 

2 05 1 

2 47 

2-77 

28 

0-683 

1-70 

2-05 

2 47 

2 76 

29 

0 683 

1-70 

2 04 

2 46 

2-76 

30 

0-683 

1-70 

204 

2 46 

2 75 

35 

0 682 

1-69 { 

203 

2 44 

2-72 

40 

0 681 

1-68 1 

202 

2 42 

2 71 

45 

0680 

1-68 1 

2-02 

2 41 

2 69 

60 

0-679 

1-68 

201 

2-40 

2-68 

60 

0678 

1 67 

2 00 

2-39 

2-66 

CO 

0-674 

1-64 

1-96 

2-33 

2-68 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers, 

The reader should sketch the probability curves for and U 
The former is asymptotic to both axes, with ordinate which decreases 
continuously as increases. The probability curve of / is symmetrical 
about the line i » 0. It is asymptotio to the t-axis at each end, and 
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the maximum ordinate is that for i = 0. Thus small values of are 
more likely than larger values. In this respect diilers from 
when the latter has more th^n 2 d.f. 

86. Test for an assumed population mean 

The statistic t and the above table provide a means of testing an 
assumed value fi for the mean of the normal population from wliich 
the random 8ami)le was drawn. On the hypothesis that /i is the 
true value, the equation (2) and the values of x and 8 derived from 
the sample enable us to calculate t. The table then gives the prob- 
abihty P that this value will be exceeded numerically in random 
sampling from a normal poj)ulation with mean fi. If P is less than 
0*05 we regard our value of t as significant. If P is less than 0-01 
we regard it as highly significant. A significant value of i throws 
doubt on the truth of the hypothesis that p is the mean of the 
population. 

Example, A random sample of nine from the men of a large city gave a 
mean height of 68 in.; and the unbiased estimate s* of the population variance 
found from the sample was 4-6 in.* Are these data consistent with the 
assumption of a mean height of 68-5 in. for the men of the city? 

Large populations of heights of men are known to be approximately 
normal. For the given sample 

*= 68 0, f = 9-1 = 8, s = = 2-12, 

and therefore, on the hypothesis that p = 68-5, we have 

I r I = I {x-p) I >/5 = (0-6) X 3/2 12 = 0*707. 

From the table wo find that, for i' = 8, the probability that this valup of i 
will be exceeded numerically in random sampling is about 0*60. The value 
is therefore not at all significant, and the tost provides no evidence against 
the assumption of a population moan of 68*6 in. 

If we make the a-ssumption that p = 69*6 or 66*6 in, we obtain a value of 
1 1 1 three times as great as before, viz. 2*12. The probability that, for y = 8, 
this value will be exceeded in random samijling is greater than 0*06, and the 
new value of I is still not significant. There are thus fairly wide limits for the 
assumed population mean which, with the data of the sample, will provide 
a value of I which is not significant. These hmits we proceed to consider. 

87. Fiducial limits* for the population mean 

Suppose that a certain sample from a normal population has a 
mean x, and provides an unbiased estimate of the population 

^ Cf. Fisher. 1930, 3 and 1933, 1. 



190 Tests of Significance [x 

variance based on v d.f. We wish to find limits to the assumed 
population mean so that, with tlie data of the sample, it will lead 
to a value of t that is not significant. If our choice is the 5 % level 
of significance, we define the 95 % confidence range for as that 
range of values of which, with the data of the sample, will furnish 
a value of | < | less than the value which corresponds to P = 0*05. 
This requires 

\x-fi\^nj8<tT^, 

BO that X — stjy/n </i<x + (5) 

Consequently fi must lie within the range extending from x-^stj^n 
to x + stj^n, which is called the 95 % confidence range for fi corre- 
sponding to the given sample; and the bounding values of this range 
are the corresponding confidence limits, or fiducial limits, for the 
mean of the population. Similarly we have confidence ranges and 
limits corresponding to other levels of significance. In each case the 
appropriate value of is found from the table of t. 

Example 1 . Find the 95 % fiducial limits for fi corresponding to the sample 
in the example of § 86. 

Here = 8, = 2*31, « = 2-12, n = 9. Hence 

8tjy}n = (2-12)(2-31)/3 = 1-63. 

The required limits are therefore 68 ± 1*63, that is 66*37 and 69*63 in. 

Example 2. Find the 98 % fiducial limits for p corresponding to the same 
sample. 

From the table we find that = 2*90 is the value of i which is exceeded 
numerically with a probability of 2 %. Then 

= (2*12)(2*90)/3 = 2*05, 

emd the required limits are 68 ± 2*06, that is 65*96 and 70*05 in. 

88. Comparison of the means of two samples 

Given two independent samples of and members, with means 
Xi and ^2 respectively, we may use the t distribution to decide 
whether the means differ significantly, or whether the two samples 
may be regarded as drawn from the same normal population.* We 
test the hypothesis that they are from the same normal population. 


• Cf. Fisher, 1926, 1, pp. 90-S. 
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Let x^{i — 1 , . . . , n^) be the values of the variable in the first sample, 
and (j == 1 , ng) those in the second. Then the sums of squares 
for the two samples are 

?ll Sj = 2 (^i 71-2 <S| = 2 ^ 2 )* 

i j 

respectively. Also, if cr* is the variance of the population, n^Slja^ 
and n^Sycr^ are distributed like with n^—\ and 712—1 d.f. 
respectively. Consequently (n^Sl-^-n^SDIa^ is a with v d.f., 
where 

p = rii -f- Tig — 2. 

An unbiased estimate of the population variance, obtained from the 
samples, is 

^2 == {rij^Sl + n^SD/v, 

since 

E{vs^) = E {uy SI -[■ 712 SI) = (711—1)0-2 + (712— l)cr2 = ^<^^5 

so that E{s^) — 0 - 2 . 

Now, in virtue of the hypotliesis, Xy and ^2 normally dis- 
tributed about the population mean with variances (T^jny and 
0-2/712 respectively. Therefore, since the samples are independent, •the 
difference — ^2 normally distributed about zero with variance 
0 - 2 ( 1 /nj+ l/Tig). If then we define a statistic t by the equation 


we have 


^ (Xi-X 2)2 /0-2(]/7 li +I/ 7 I 2 ) 
p (ni*Sf + n2*Sl)/o-2 


( 6 ) 


The numerator and denominator of the second member are dis- 
tributed independently Uke;\^'2 ] and p d.f. respectively. Hence, 

by the Theorem of § 85 , the statistic I conforms to the t distribution 
for p D.F. It is the quotient of the normal deviate Xg by the 
estimate of its s.e. derived from the samples. If the value of t 
obtained from the samples is significant, our hypothesis is dis- 
credited. 

We might vary the above argument to test the h 3 rpothesis that 
the samples were drawn from different normal populations, with 
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means and fi' respectively, but the same variance. In this case 
Xi -'ll and x^ — ii^ are normally distributed about zero with variances 
(r*/ni and respectively. Hence their difference 

is normally distributed about zero with variance cr^(l/ni+ l/nj). 
This difference takes the place of — the above argument, so 
that 

Sn/(AK+ i/^^2) ^ 

conforms to the t distribution for d.f. 


Example 1. Show that the 95% fiducial limits for the difFerenco of the 
means of the populations are ± l/nj + 1/n,). where is the value 

of t corresponding to f* = 0*05 and v d.f. If zero lies outside the above range 
of /i — /i', we conclude that the difference between the means of the samples is 
significant of a difference between the means of the populations. 

Example 2. The heights of the ten men of a random sample from an 
unknown population gave a moan of 69 in., and a sura of squares of deviations 
from the mean equal to 42 in.* Apply the I test to the hypothesis that this 
sample is from the same population as tliat of the example in § 86. 

From the data we have 

^i=68, *,= G9, niiS^ = 36, n,iS| = 42, ni = 9, nj=10. 

The estimate s* of the population variance is 

s* = (36 + 42)/17 = 4-59, 


BO that s = 2*14. The value of t from the sample is then 

1 /90 

* = ^ 4719 = 


For V = 17 this value is not &t all significant. The test therefore provides no 
evidence against the hypothesis. 


89. Significance of an observed correlation 

The distribution and table of t may also be used to test the 
significance of a value r of the correlation coefficient, given by a 
random sample of n pairs of values from a bivariate normal popula- 
tion. The s.B. of r mny not be used in the case of small samples, 
since the distribution of r is then far from normal. We have seen 
that, if the variables in the normal population are un correlated, 
the value of r* in random samples of n pairs is a i(n— 2)) 



89] Significance of Observed r 198 

variate. Therefore, by Theorem VII of §72, r*/(l— r*) is a 
2)) variate. Thus, if t is defined by 


^ = 7(T^)V(^-2), (8) 

then t^l(n — 2) is a /^ 2 (h I (^“"2)) variate, so that t conforms to the 
t distribution for 2 d.f. 

To test the significance of the sample value of r, we make the 
assumption that the variables in the population are uncorrelated. 
A simple calculation then gives the value of t corresponding to the 
sample; and the table of t then tells whether the value obtained is a 
rare one. If it is, our assumption is discredited, and we conclude 
that the variables in the population are probably correlated. 


Example 1. Is a correlation coefficient of 0-6 significant, if obtained 
from a random sample of 11 pairs of values from a normal population? 

Hero r=J, j^=9, t=Jx3-h == == 1*73. From the table we find 

that the y^robability of obtaining a value of t larger than this is greater than 
0*10. Hence there is no reason to suspect the hypothesis of uncorrelated 
variables in the population. The value 0-6 is not significant. 


Example 2. Find the least value of r, in a sample of 27 pairs from a normal 
population, that is significant at the 5 % level. 

Here = 25 and, at the 5 % level of significance, t = 2*06. Hence for r to 
bo significant we must have 

5r 

->2*06, 




which requires r^>0145 and therefore |r|>0’38. Values of r numerically 
less than 0-38 are not significant at the 5 % level. 


The accompanying short table gives the least values of j r | that 


Table 5. Mininnim vahics of r that are significant at the 5 % level 


V 

r 

V 

r 

V 

r 

V 

•• 

4 

0-811 

11 

0-553 

18 

0-444 

45 

0-288 

6 

0-755 

12 

0-532 

19 

0-433 

60 

0 273 

6 

0-707 

13 

0-514 

20 

0-423 

60 

0-250 

7 

O-OOG 

14 

0-497 

25 

0-381 

70 

0-232 

8 

0-632 

15 

0-482 

30 

0-349 

80 

0-217 

9 

0-602 

16 

0-468 

36 

0-326 

90 

0-206 

10 

0-576 

17 

0-456 

40 

0-304 

100 

0-195 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers* 


WMS 


*3 
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are significant at the 6 % letel, for samples of different sizes from a 
normal population. These values may be calculated as in Ex. 2 
above. The number v of degrees of freedom is n ~ 2, where n is the 
number of pairs of values in the sample. 


90. Significance of an observed regression coefficient 

The significance of an observed value, 6, of the linear regression 
coefficient of y on x, in a random sample from a normal population, 
may also be tested by the t distribution and table. It was shown in 
§ 83 (Theorem V) that, for random samples of n pairs from an un- 
correlated normal population, 6Vf/cr| is a 1)) variate. 

Consequently the statistic t defined by 

< == (9) 

conforms to the t distribution for n — 1 d.f. But, since cr^ and 0*2 
are usually unknown, this relation is not of much use in providing 
a test of significance for 6. However, a similar result is free from this 
objection. For, from the relation 

b = rS^ISi, (10) 


in which S\ and S\ are the variances of x and y in the sample, it 
foUowsthat 6*S(x<-x)2 nr*51/o-| 


where as usual denotes the estimate of y^ found from the regression 
equation. Now, by §82, the numerator and denominator of the 
second member are distributed like with 1 and n — 2 d.f. respec- 
tively. The quotient is therefore a variate, and the 

statistic t defined by 

t = (11) 

conforms to the t distribution for n — 2 d.f., and may be used to 
test the significance of the value of b found from the sample. More 
generally Fisher* has shown that, if the variables in the population 
are correlated, with as coefficient of regression of y on x, the 
statistic which conforms to the t distribution is obtained from (11) 
on replacing 6 by (6 — yff). 


♦ Fisher, 1922, 2, p. 609. 
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91. Distribution of the range of a sample. 

The range, w, of a sample of values is the difference between 
the highest and lowest values. This concept is widely used in 
the statistical control of quality in mass production by an in- 
dustrial plant. Though the range does not conform to the ^-distri- 
bution, it is convenient here to consider tlie sampling distribution 
of w for samples of n values from a continuous pojiulation, whose 
relative frequency density isf(x) in the interval (a, 6). Consider 
the infinitesimal intervals (w, u-\-du) and (v, v-{-dv) within the 
range of variation of x, and such that u<v. Then the probability 
that, at any drawing, the value chosen will lie in the first of these 
is f{u)du, and in the second f{v) dv. The probability that it will 
lie in the intervening interval is 


P = 

J u 


u+du 


f{x) dx. 


(i) 


Consequently the probability that, in the drawing of n values, 
one will lie in the interval du, one in the interval dv, and the 
remaining n — 2 in the intervening interval is 

dP = n(n— \)f{u)f{v) dudv, (ii) 

since n(n— 1) is the number of ways in which one of the n values 
may fall in each of the intervals du and dv. In other words, dP 
is the probability that the lowest value in the sample will fall in 
the interval du, and highest in the interval dv. 

We may express this in terms of u and w instead of u and v. 
Since w = v — u the Jacobian of u, w with respect to u, v is unity, 
and the probability that the lowest value will fall in the interval 
du, and w in the interval dw, is therefore 

a u+w \n-2 

f{x)dxj dudw (hi) 

to within infinitesimals of this order. Summing for all intervals 
du consistent with a range w, we find the probability that to will 
fall in the interval dw, irrespective of the value of u, as 
r rb-w / cu+ic \n-a ~| 

dp==n(n— 1)M /(w)/(u-hu;) y f{x)dxj du^dw. (12) 


I1-a 
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This is the required probability differential of w. Denoting the 
coefficient of dw by (f>{w) we have the expected value of w as 



w<f>{w) dw. 


This expected value has been calculated by numerical integration 
for the case of a normal population of s.d. cr, and for various 
values of n. Among the results found are 


n: 2 3 4 6 6 10 50 

E(w)l<r: M3 1*69 2-06 2*33 2*53 3-08 4-50 

Example 1. If the variable in the population has a uniform distribu- 
tion from 0 to 5, then f{x)= 1/6. Show that the probal)ility differential 
of the range is n(n— 1) {b — w) dw, and that the expected value of 

w is (n— 1) 6/(n-f 1). 


Example 2. Show that, in taking a random sample of n numbers 
between zero and unity, the probability tlmt the range will exceed 0-5 
is 1 — (n-h l)/2". Show also that 8 is the least value of n for which this 
probability exceeds 0*96. 


Distribution of the Variance Ratio 

92. Ratio of independent estimates of the population variance 

As in § 88, let us consider two independent random samples whose 
values are (»=1, and 0‘==1, ...,^ 2 ), with means 

and X 2 respectively. These provide estimates si and si of the variances 
of the populations, given by 

na-1 * 

corresponding to and Vg respectively, where 

We wish to consider whether two such estimates are significantly 
different, or whether the samples may be regarded as drawn from 
the same normal population of variance o'*. 
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An appropriate test is furnished by the sampling distribution of 
the ratio F of two such estimates of cr^ obtained from independent 
samples from the same normal population. Thus if 


p = fl = 

ViF S\j(T^ 
~ n^Sljcr^' 


(13) 


Now the numerator and denominator of the second member are 
distributed independently like with and respectively. 

.Hence the quotient is a variate, so that v^Fjv^ con- 

forms to the distribution of §72, with I = and m = Con- 
sequently the probability that the value of F will faU in the interval 
dF is 


dP = 




(14) 


This is the required distribution of the variance ratio for Vi and 1^2 
It will be observed that the distribution is independent of the 
variance <t^ of the population. 



Since v^Flv 2 is a ^^ 2 ) variate its modal value, by § 72 (28), 

is + !)• Consequently the modal value of F is given by 

“ ^1(^2 + 2 ) ~ 
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and this is alwa^ s less than unity. Similarly, the mean value of 
v^Fjv^, by §72(29), is and therefore the expected 

value of is 

which is independent of and is always greater than unity. The 
probability curve for F depends, of course, on both and but 
its main features, for Vy > 4, are those of the curve in Fig. 9. 


93. Fisher’s z distribution. Table of F 


A distribution equivalent to ( 1 4) was first obtained by R. A. Fisher. 
Writing z = ^ log^ F and therefore F = in the above result, we 
deduce immediately that 


dP = 




(15) 


This is Fisher’s z distribution. The probability that a specified value 
of z will be exceeded in random sampling depends upon Vy and 1 ^ 2 - 
Fisher published tables* giving the values of z that will be exceeded 
with probabilities 0-05 and 0*01 respectively, corresponding to 
specified values of Vy and 1 ^ 2 - From these G. W. Snedecor prepared 
a tablet for the variance ratio, which he denoted by F in honour of 
Fisher. Extracts from this table are here printed by permission of 
Snedecor and the Iowa Press. The ratio, F, tabulated is that of the 
larger estimates of variance to the smaller. The number of degrees 
of freedom corresponding to the larger estimate determines the 
column in the table, while V 2 determines the row. At the inter- 
section of the row and the column are given two values of F. The 
upper is the value that will be exceeded with a probability 0*05, 
and the lower with a probability 0*01. These are often referred to 
as the 5 and 1 % ‘points’ of F. The latter is, of course, always the 
larger. The hjq)othesis to be tested is that the samples are from the 
same normal population, or from normal populations of equal 
variance. A value of F less than the 5 % point is not significant. 
A value between the 6 and 1 % points is significant at the former 

• Statistical Methods for Research Workers, 1926. 

t Snedecor, 1934, 2 or 1938, 3, pp. 184-7. 
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Table 6 . The variance ratio , 5 and 1 % * points '* of F 

Vi is the number of degrees of freedom for the greater 
estimate of variance, and for the sm'aller 



1 

2 

3 

4 

5 

6 

8 

12 

24 

00 

2 

18-51 

98-49 

19-00 

99-00 

19-16 

99-17 

19-25 

99-25 

19-30 

99-30 

19 33 
99-33 

19-37 

99-36 

19-41 

99-42 

19-45 

99-46 

19-50 

99-50 

3 

10-13 

34-12 

9-55 

30-82 

0-28 

29-46 

9-12 

28-71 

9-01 

28-24 

8-94 

27-91 

8-84 

27-49 

8-74 

27-05 

8-64 

26-60 

8-53 

26-12 

4 

7-71 

21-20 

6-94 

1800 

6-69 
16 69 

6-39 

15-98 

6-26 
15 62 

6-16 

16-21 

6 04 
14-80 

6-91 
14 37 

6-77 

13-93 

6-63 

13-46 

5 

6-61 

16-26 

6-79 

13-27 

6-41 

12-06 

6-19 

11-39 

6-06 

10-97 

4-95 
10 67 

4-82 

10-27 

4-68 

9-89 

4-53 

9-47 

4-36 

9-02 

6 

5-99 

13-74 

6-14 

10-92 

4-76 

9-78 

4-53 

9-15 

4-39 

8-76 

4-28 

8-47 

4-15 

8-10 

4-00 

7-72 

3-84 

7-31 

3-67 

6-88 

7 

5-59 

12 - 2 o 

4-74 

9-65 

4-35 

8-45 

4-12 

7-85 

3-97 

7-46 

3-87 

7-19 

3 73 
6-84 

3 67 
6 47 

3 41 
6-07 

3-23 

5-66 

8 

5-32 

11-26 

4-46 

8-65 

4-07 

7-59 

3-84 

7-01 

3-69 

6 63 

3-58 

6-37 

3 44 
6-03 

3-28 

6-67 

3-12 

5-28 

2-93 

4-86 

9 

6 12 
10-56 

4-26 

8-02 

3-86 

6-99 

3-63 

6-42 

3-48 

6-06 

3-37 

6-80 

3-23 

5-47 

3-07 

611 

2-90 

4-73 

2-71 

4-31 

10 

4-90 

10-04 

4-10 

7-66 

3-71 

6 65 

3-48 

6-99 

3-33 
6 64 

3 22 
6-39 

3-07 

6-06 

2-91 

4-71 

2-74 

4-33 

2 - 64 

3 - 91 

12 

4-76 

9-33 

3-88 
i 6-93 

3-49 

6-95 

3-26 
6-41 1 

3-11 

6-06 

! 3-00 
j 4-82 

2-85 

4-60 

2-69 

4-16 

2-60 

3-78 

2 - 30 

3 - 36 

14 

4-60 

8-86 

* 3-74 
6-61 

3-34 

6-66 

3-11 ! 
6-03 

2-96 

4-69 

1 2-85 

! 4-46 

: 2-70 
4-14 

2 - 53 

3 - 80 

2 - 35 

3 - 43 

2 - 13 

3 - 00 

16 

4-49 
8 63 

3-63 
6-23 1 

3 24 
6-29 

3 01 
4-77 

2-86 

4-44 

1 2-74 

I 4-20 

1 2-69 
i 3-89 

2 - 42 

3 - 65 

2 - 24 

3 - 18 

2-01 

2-76 

18 

4-41 

8-28 j 

3 66 
6-01 ; 

3-16 
; 6-09 j 

2-93 1 
4-68 1 

2-77 

4-25 

2-66 
4-01 1 

2-61 

3-71 

2 - 34 

3 - 37 

2 - 15 

3 - 01 

1 - 92 

2 - 67 

20 

4-35 1 
8-10 i 

3-49 
i 6-86 

3-10 ! 
I 4-94 

2-87 1 
4-43 

2-71 

4-10 

2 - 60 : 
3-87 

2-45 

1 3-56 

2-28 

3 23 

2-08 
2-86 ^ 

1 - 84 

2 - 42 

25 

4-24 

7-77 

3 38 

6 67 

2-99 

4-68 

2-76 ' 
4-18 j 

2-60 ’ 
3-86 j 

2 - 49 ! 

3 - 63 i 

2 - 34 

3 - 32 

2-16 

2-99 

1-96 

2 62 

1 - 71 

2 - 17 

30 

417 

7-66 

3-32 

6-39 

2-92 

4-61 

2-69 : 
4-02 

2 - 53 i 

3 - 70 : 

2 - 42 i 

3 - 47 j 

2 - 27 

3 - 17 

2-09 

2-84 

1 - 89 

2 - 47 

1-62 

2-01 

40 

4-08 

7-31 

3-23 

6-18 

2-84 

4-31 

2 61 ; 
3-83 

2 - 45 ! 

3 - 61 

2 - 34 

3 - 29 j 

2-18 

2-99 

2-00 

2*66 

1 79 
2-29 

1-61 

1-81 

60 

4-00 

7-08 

3 - 15 

4 - 98 

2-76 [ 
4-13 1 

2-62 
3-65 1 

2-37 ! 
3 34 ; 

2-25 

3 12 

2-10 j 
2-82 

1 - 92 

2 - 50 

1*70 

212 

1-39 

1-60 

80 

3-96 

6-96 

3 - 11 

4 - 88 

2-72 1 
4-04 1 

2 - 49 

3 - 66 1 

2 - 33 1 

3 - 25 ! 

2-21 

3-04 

2-06 1 
2-74 i 

1-88 

2-41 

1-66 

203 

1-32 

1-49 


Extracted from G . W. Snedecor’s Statistical Methods, pp. 184-7, by courtesy 
of the author and the Iowa Collegiate Press. 
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level, but not at the latter.. A value greater than the 1 % point is 
regarded as highly significant. A significant value of F thi-ows doubt 
on the truth of the hypothesis. This test will be used extensively in 
the next chapter. It is illustrated by the following numerical 
example. 


Example. Apply the test to the samples of heights given in the examples 
of §§86 and 88 (Ex. 2). 

The two estimates of the population variance furnished by the samples 
6u:e 36/8 and 42/9, corresponding to 8 and 9 d.f. respectively. The second is 
the larger, jso that 


This value of F is much below the 6 % point as indicated by the table. It is 
therefore not at all significant, so that the samples may very well be regarded 
as drawn from the same population. 


Fisher’s Transformation of the 
Correlation Coefficient 


94. Distribution of r. Fisher’s transformation 
The distribution of the correlation coefficient r, in random 
samples of n pairs of values from a bivariate normal population in 
which the correlation is p, was given by Fisher* in 1915. He showed 
that the probability that the correlation coefficient will have a 
value in the interval dr is 


dp 


(l-p2)K n-i) / aroco3(-r/?) \ 

77(n — 3)! ' ^ d{rp)^-^\ ,^(1 — j 


dr. 


This distribution is far from normal, with a probability curve which 
is very skew in the neighbourhood of p =* ± 1 , even for large samples. 
The use of the s.b. of r is therefore not to be recommended. In a 
subsequent paperf Fisher showed that the transformation 


* = C=iloge{^. (16) 

defines a variate 2 , whose distribution is approximately normal with 
mean f and variance l/(n--3), tending rapidly to normality as the 
size of the sample increases. Thus the s.b. of z is independent of the 


♦ Fisher, 1916, 2. 


t Fisher, 1921, 1. 
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value of r. For the proofs of these properties we must refer the 
reader to Fisher’s papers; but the distribution of r in the case of an 
uncorrelated normal population was considered in §82, and the 
corresponding distribution of z will be discussed in Ex. 1 below. 

By means of the statistic z we may test whether an observed 
correlation coefficient diflFers significantly from some theoretical 
value, or from some value given in advance; or whether the values 
of r obtained from two samples differ significantly. From the 
values of r and p we determine those of z and ^ by (16); and it is 
then easy to decide whether the deviation 2 — f is significant for a 
normal distribution of variance l/(n— 3). To obviate the necessity 
of calculating z in every case, Fisher published a table setting out 
the values of r, which correspond to specified values of z ranging 
from 0 to 3 at intervals of 0*01. Extracts from this table are printed 
herewith. 

Table 7. Fisher's transformation of r 
Values of r for specified values of i ut intervals of 0*02 


2 

000 

0-02 

004 

0-06 

0-08 

00 

0-000 

0-020 

0-040 

0-060 

0-080 

01 

0-100 

0 119 

0-139 

0-159 

0-178 

0-2 

0-197 

0-217 

0-236 

0-254 

0-273 

0-3 

0-291 

0-310 

0-328 

0-346 

0-363 

0*4 

0-380 

0-397 

0-414 

0-430 

0-446 

0-6 

0-462 

0-478 

0-493 

0-508 

0 623 

0*6 

0-637 

0-651 

0-665 

0-578 

0-592 

0-7 

0-604 

0-617 

0-629 

0 641 

0-653 

0-8 

0-664 

0-676 

0-686 

0-696 

0-706 

0-9 

0-716 

0-726 

0-735 

0-744 

0-753 

10 

0-762 

0-770 

0-778 

0-786 

0-793 

M 

0-801 

0-808 

0-814 

0 821 

0-828 

1*2 

0-834 

0-840 

0-846 

0-851 

0-857 

1-3 

0-862 

0-867 

0-872 

0-876 

0-881 

1-4 

0-885 

0-890 

0 894 

0-898 

0-902 

1-6 

0-905 

0 909 

0-912 

0-916 

0-919 

1-6 

0-922 

0-926 

0-928 

0-930 

0-933 

1-7 

0-936 

0-938 

0-940 

0-943 

0-946 

1-8 

0-947 

0-949 

0-951 

0-953 

0-956 

1-9 

0-966 

0-958 

0-960 

0-961 

0-963 


Reproduced by permission of the author. Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers, 


For the case ^ = 0, that is to say for testing whether an observed 
value of r indicates any correlation in the population, the method 
of § 89 is preferable. 
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Example 1. For samples from an uncorrelated normal population, we 
know that the distribution of r is 

_ (1 — 

^ ““ Bari{n::2)) • 

Fisher’s transformation (16) may be expressed 

r = tanh 2 , 

so that dr = sech*zd 2 , 

and the distribution of z is therefore 


dp = aech"“^ 2 d 2 /B(i, l{n — 2)). 

Now sech 3 is approximately equal to exp ( — Jz*), so that 
dp oc exp ( — — 2) z®) dz 

approximately. Consequently the distribution of z is approximately normal, 
with variance l/(a — 2). As Fisher has shown, however, a better approxima- 
tion to the variance of z is l/(?i — 3). 

Example 2. In a random sarn|)lo of 28 pairs of values from a bivariate 
normal population, the correlation was found to be 0*7. Is this value con- 
sistent with the assumption that the correlation in the population is 0-5? 

Here r = 0*7, p = 0*5 and n = 28. From the table we find z = 0-87 and 
£= 0-65. so that z_J=0 ;}2. 

The s,B5. of z is l/-y^(25) = so that 

(z-O/(S.B.) = l-0. 


Since z — 5 is considerably loss than twice the s.e., its value is not significant. 
So far as this test goes, the correlation in the population might very well 
be 0-5. 

The 95% fiducial limits for p are found in the usual manner (cf. §61). 
The value of ^ must be such that 


|z-^|<1-96(s.e.) = 0-392. 
Consequently 0*87 — 0*392 <^<087 + 0*392 

0*48<^<1*26, 

£uid therefore from the table 

0-446<p<0*851. 


The 95 % fiducial limits for p are therefore 0*45 and 0*85 approximately. 


95. Comparison of correlations in independent samples 

Next suppose that two independent samples of and rtj pairs 
give correlation coefficients of and rg respectively. May they be 
regarded as drawn from the same population; or is the difference 
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between and r 2 significant? On the assumption that the samples 
are from the same normal population, the difference between the 
values of z for the two samples is normally distributed with s.B. 


The two values of z are 


J (rij — 3 ng ~ 3 ) * 


11 1+^1 11 1+^2 


If 1 — 22 1 is less than 2e, the difference is not significant at the 
5 % level; and the assumption that the samples are from the same 
population, or from equally correlated normal populations, is not 
discredited. 


Example^ Tho first of two samples consists of 23 pairs, and gives a correla- 
tion of 0-5; while the second, of 28 pairs, has a correlation of 0*8. Are these 
values significantly different? 

On the hypothesis that they are from the same normal population, the 
s.E. of the difference of the 2 ’s is 

€ = ^J{~2 6 d" 2~5) ~ -v/O'OO = 0*3. 

From tho table we find that = 0-65 and Zj = 1*10, so that 
I Zi — Zj I = 0*55 = l*83e, 

which is a little less than 26, and is therefore not quite sign i (leant at the 6 % 
level. Tho hypothesis is not discredited. 

If in the above example the value of is not given, we may find the limits 
between which it must lie in order that | — r 2 1 should not bo significant at 

the 6 % level. For tho condition to bo satisfied is 

|zi-Za|<l-96€=0'588. 

Consequently 1*10 — 0*588 <Zi< 1*10 + 0*588 

0*612<zi< 1*688, 

80 that 0*47 <ri <0*93. 


96. Combination of estimates of a correlation coefficient* 

Suppose that k samples of % pairs of values (i = 1, A:) yield 
correlation coefficients r^. We may wish to enquire whether the 
samples may be regarded as drawn from the same normal population 


• Cf. Yatos, 1934, 6. 
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(or equally correlated ones); and, if they may be so regarded, to 
obtain a combined estimate of the population value p. To test the 
homogeneity of the estimates r^, we make the assumption that the 
samples are from equally correlated populations. Then, by means 
of Fisher’s transformation (16), we obtain values of variates 
which are approximately normally distributed about a common 
mean, with variances l/(n^ — 3). The estimate of their common 
mean, which has minimum variance, is obtained by weighting 
the values inversely* as their variances. This estimate z is therefore 


5 = S K-3)2i 
LK-3) • 


(17) 


Then, since the variates are approximately normally distributed 
about with variances l/(n^ — 3), the sum 2] (^i — 3) (z^ — z)^ is dis- 
tributed approximately as with fc — 1 d.f., the mean z having 
been determined from the data.f The significance of the calculated 
value of this quantity may be ascertained from the table of 
We may express the above sum in a form more convenient for 
numerical calculation. Thus 


s (n, - 3) (z, - 5)2 = S (n, - 3) z? ~ z2 S K - 3) 

= 2 - 3) zf - [2 (n< - 3) z,] (n, - 3). 

If the calculated value of this expression is not significant as a 
value of with fc — 1 d.f., the estimates of the correlation in the 
population may be regarded as homogeneous. In that case the 
value of z given by (17) is an estimate of the true value, corre- 
sponding to the population coefficient p. The required estimate of 
p is then given by 

p = tanhz, 

and its value may be read ofiF from the table. 

Example, Independent samples of 21, 30, 39, 26 and 36 pairs of values 
yielded correlation coefficients 0*39, 0*61, 0*43, 0*64 and 0*48 respectively. 
May these estimates be regarded as homogeneous? If so, find an estimate of 
the correlation in the population. 

• Cf. Ex. IV, 6. 

t For details of this part of the proof see Ex. X, 10 below. 



96 ] Examples 205 

Tiie corresponding values of noiay be taken from the table, and the 
calculation tabulated as follows: 




rii — Z 

(rii-S) Zf 

(n,-3)z,* 

0*39 

0*412 

18 

7*416 

3-055 

0*01 

0*700 

27 

19*143 

13-572 

0*43 

0*400 

36 

16-500 

7*618 

0*54 

0 004 

23 

13-892 

8-391 

0*48 

0-524 

32 

10-708 

8*780 

Totals 

— ' 

136 

73-779 

41-422 


The value of from the data is thenTore 

= 41*422 — (73*779)2/136 = 1*4 approximately. 

For 4 D.F. this value is not at all significant, so that the coefficients maybe 
regarded as homogeneous. From (17) we have 

z = 73*779/136 = 0*5425. 

The table then gives the estimate of p for the population as 0*495 = 0*6 
nearly, 

COLLATERAL READING 
Fisher, 1938, 2, chapters v, vi and vii. 

Fisher, 1915, 2; 1921, 1; 1922, 2; 1924, 2; 1925, 1 and 1935, 1. 
Tippett, 1931, 2, chapter v. 

Yule and Kendall, 1937, 1, chapter xxiii. 

Rider, 1930, 1; 1939, 6 , chapters vi and viti. 

Kenney, 1939, 3, part ii, cliapter vii and pp. 172 86. 

‘Student’, 1908, 1 and 2. 

Smith, 1939, 7. 

Aitken, 1939, 1, chapter vii. 

Mills, 1938, I, chapter xviii, pp. 598-618. 

Rietz, 1938, 6. 

Sawkins, 1940, 3; 1941, 1. 

Pitman, 1937, 3. 

Kendall, 1943, 2, chapter x and pp. 336-47. 


EXAMPLES X 

1 . A random sample of 16 values from a normal population 
showed a mean of 41*5 in., and a sum of squares of deviations from 
this mean equal to 135 in.* Show that the assumption of a mean 
of 43-5 in. for the population is not reasonable, and that the 95 % 
fiducial limits for this mean are 39*9 and 43*1 in. 
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Another sample of 20 values from an unknown population has a 
mean of 43-0 in., and a sum of squares of deviations from this mean 
equal to 171 in.* Show that tlie two samples may be regarded as 
from the same normal population. 

2. Nine patients, to whom a certain drink was administered, 
registered the following increments in blood pressure: 7, 3, — 1, 4, 
— 3, 5, 6, — 4, 1. Show that the data do not indicate that the drink 
was responsible for these increments. 

On the null hypothesis the above values are regarded as a random 
sample from a normal population whose mean is zero. The data 
give X = 2 and s* = G3/4. Hence t = 1*51 with ^ = 8. This value is 
not significant. 

3. In testing the superiority of Leake’s drill over the ordinary 
drill, plots in the form of long strips were cultivated, two adjacent 
strips being allotted at random to Leake’s drill and the ordinary.* 
For ten such pairs of plots the values of the excess of the weight of 
grain from the plot treated by Leake’s drill over that obtained by 
use of the ordinary drill were 2-4, 1-0, 0*7, 0*0, 1*1, 1*6, 1-1, —0-4, 
0*1 and 0-7. Show that the data furnish strong evidence of the 
superiority of Leake’s drill. 

From the data x = 0*83 and = 6*001, so that 

5* = G-001/9 = 0*666. 

On the null hypothesis the mean of the population is zero, and 
t = 3*22 for 9 d.f. This value belongs to about the 1 % level of 
significance. There is thus strong evidence against the null hypo- 
thesis. 

4. For a random sample of 10 pigs, fed on diet A, the increases in 
weight in a certain period were 

10, 6, 16, 17, 13, 12, 8, 14, 15, 9lb. 

For another random sample of 12 pigs, fed on diet B, the increases 
in the same period were 

7, 13, 22, 16, 12, 14, 18, 8, 21, 23, 10, 17 lb. 


* Data from Wishart;, 1934, 3, p. 32. 
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Show that, by the test of § 88, the mean increases of 12 and 16 lb. 
in the two samples are not significantly different. 

[s^ = (120 + 314)/20 = 21-7; t = 1-5, p = 20.] 

5 . Show that the estimates of the population variance from the 
samples in Ex. 4 are not significantly different. 

6 . Deduce from §82(25), that the statistic r ^(n — 2)jyj{\—r^) 
conforms to the t distribution forn — 2 d.f. 

7. A random sample of 18 pairs from a bivariate normal popula- 
tion showed a correlation coefficient of 0-3. Is this value significant 
of correlation in the population? Prove by the method of §89, 
Ex. 2, that the least value of r significant at the 5 % level is 
about 0*47. 

8 . A random sample of 19 pairs from a bivariate normal popula- 
tion showed a correlation of 0*65. Prove that this is consistent with 
the assumption of a correlation of 0*40 in the population. Also 
show that the 95 % fiducial limits for p are 0*28 and 0*85 approxi- 
mately. 

A second sample, of 23 pairs, showed a correlation of 0-40. Prove 
by the method of §95 that the two samples may be regarded as 
from equally correlated populations. 

9 . Show that the mean value of the positive square root of a 
Beta variate of the second kind, with parameters I and m, is 

r(z+i)r(m-i)yr(or(m). 

Deduce that the mean value of | i ] for v d.f. is 

Also show that, for samples from an uncorrelated bivariate normal 
population in which the variances are erf and or|, the mean value of 
the modulus of the regression coefficient of y on a; is 


<r,r(i(n-2))/(rir(i(n-l))V^. 
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10 . To prove that the sum 2 (»i — 3) ( 2 , — 2 )® of § 96 is distributed 

< 

approximately as y* with i:— 1 d.f., we may proceed by the method 
of § 77. Denoting the sum by T~ we have 

T* = S(n<-3)[(2<-S)-(2-OT 

= 2K-3)(2<-C)^-2(2-g^(n,-3)(2,-0 + (2-0==EK-3) 
= I (n,- 3) (2,-0* -a (i) 

where 0 = S (W’i - 3) (Zi - fei/VE («< - 3) . 

Now introduce an orthogonal linear transformation 

i 

where = (z,- - 0 fiuj - 3) , 

and the first of the is identical with above. Then, since the 
are independently and normally distributed about zero with unit 
S.D., 80 are the ^’s; and in virtue of (i) 

i - 1 i - 1 i - 2 

Consequently 2^^ is distributed like with k—l d.f. 

11 . Show that, for the t distribution with d.f., the moment of 
order r (even) about the origin is, for r < f, 

(r — 1 ) (r — 3) ... 1 . v^^l{v — 2) {y — 4) ... (v — r). 

12. For 8am])le.s of n from a population with the exponential 
distribution / (x) = c“®, a; ^ 0, show that the range conforms to 

il>(w) = (n- 1) e ^(1 - e- 
and hence that E(w) = l4’^ + a + i4- 



CHAPTER XI 


ANALYSIS OF VARIANCE AND COVARIANCE 

Analysis of Variance 

97. Resolution of the ‘sum of squares* 

In the words of its author, R. A. Fisher, analysis of variance is 
the ‘separation of the variance ascribable to one group of causes 
from the variance ascribable to other groups'.* It is a procedure by 
which the variation embodied in the data of the sample may be 
resolved into component variations due to independent factors. 
Each of the components yields an estimate of the population vari- 
ance; and these estimates are tested for homogeneity by means of 
the F table. 

Consider a random sample of JV values of a normally distributed 
variable x. It is frequently possible to arrange these in classes 
according to a certain factor or criterion. For instance, if the 
variable is the price of a certain commodity, the classes may 
correspond to different seasons or to different districts. Or, if the 
variable is the crop yield of a variety of cereal, the classes may 
correspond to different manurial treatments. Let x^j denote the 
value of the jth member in the f'th class. Thus the first subscript 
indicates the class, and the second tlie position in that class. Let 
bo the number of members in the ith class, x^ the mean value for 
that class, and x the general mean for the whole sample of JV values. 


Then 

= V S Zfj 
i i 

o 

11 

! 

(1) 

and 

riiXi = 

i 

, j:{x,j--Xi)=^o. 

i 

(2) 

To resolve the sum of squares 

of tlie deviations of the N values 

*<y 


from the general mean, we may do so first for the members of the 
ith class. Thus, in virtue of § 3 ( 10), 

S (% - s)® = 2 (% - 

y y 

• Fisher, 1938, 2, p. 216. 


wut 


>4 
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and on summing this result for all the classes we have the required 
resolution of the ‘ sum of squares * 

S S (% - = S 2 (% - + 2 - x)>. (3) 

i i i i i 

This formula holds, of course, whether the population is normal 
or not. 


98. Homogeneous population. One criterion of classification 
Now suppose that the population, from which the random sample 
of values was drawn, is homogeneous with respect to the factor of 
classification, that is to say, that the factor has no effect upon the 
value of the variate. Then if the population is divided into classes 
according to this factor, the different classes will have the same 
statistical properties. In particular, they will have the same mean 
II and the same variance or^, which are the mean and the variance of 
the population. Then, from the various sums in (3), we can obtain 
three unbiased estimates of cr^. For, by (7) and (8) of §54, the 
expected value of the first sum in sampling is {N so that this 

sum, divided hy N — I, gives an unbiased estimate of cr^ based on 
N--! D.F. Similarly, the values in the ith class constitute a 
random sample whose mean is x^, so that 

i 

and therefore* 


^[2 2 = 2 l)(ra = (N^h)(r^, 

i i i 


where h is the number of classes. Thus the second sum in (3), divided 
by N — h, gives an unbiased estimate of cr^ based on N -h d.f. 
And, since the expected values of the two members of (3) must be 
equal, that of the final sum* is (A — l)cr2; gQ divided 

by A — 1, gives an unbiased estimate of cr* based on A — 1 d.f. The 
identity 


(iV^-.l)(ra= (iV^-A)(r2 + (A-l)(r» 


( 4 ) 


obtained by taking expected values of the various sums in (3), 
shows that degrees of freedom are additive, the number of freedoms 


* See also Ex. XI, 1, below. 
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corresponding to the total sum being equal to the sum of the 
freedoms corresponding to the partial sums. The above results are 
usually tabulated as follows: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Between class means 

h-1 

i 

Sn,(x,-2)V(ft-l) 

Within classes 

N-h 

‘ / 

SX(x„-x,)»/(2^-fc) 
t 1 

Total 

N-l 

XS(*«-x)» 

1 / 

— 


In the columns headed ‘d.p/ and ‘Sum of squares’ the items are 
additive, but not in the last column which gives the estimates of cr*. 

The argument so far holds whether the population is normal or 
not. In the case of a homogeneous normal population the results 
follow from the distributions of the various sums in (3). For then 
the first sum divided by cr^ is distributed like with N -- I d.f., as 
proved in §77. The mean value of this sum is therefore l)(r^. 
Similarly, 2 — is ^ — 1 d.f.; and therefore the 

second sum in (3), divided by cr^, is distributed like with 

2(n<-l) = (i\r-A)D.F. 

i 

The mean value of this sum is therefore (N — h)a^. Similarly for 
the final sum in (3). 

In order to test the homogeneity of the estimates of cr* by means 
of the variance ratio and the F table, it is necessary to assume that 
the population is normal; for this test is founded on that assumption. 
The practice is to compare the estimate ‘ between class means ’ with 
that obtained ‘within classes’. For the final sum in (3) represents 
the variation due to the factor of classification, and the second sum 
in (3) is the residual variation after the former has been removed. 
If the estimate obtained between classes is significantly greater 
than that within classes, we are justified in concluding that the 
factor of classification exercises an infiuence on the value of the 
variable. In that case the assumption of homogeneity is discredited, 
and we must regard the population as heterogeneous. If, however, 
the estimates of or^ are not significantly diSerent, the test provides 


14 * 
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no evidence agaii.st the hypothesis of a homogeneous population. 
It is important to remember that the variance ratio may be tested 
by means of the F table only if the two estimates of variance are 
statistically independent. Since the mean of a random samy Je from 
a normal population is distributed independently of its variance, 
the two sums in the second member of (3) are independent, and the 
required condition is satisfied.* 

99. Calculation of the sums of squares 

In calculating the above sums of squares it is not necessary to 
find the deviations from the various means. We know that the sum 
of the squares of the deviations of N numbers (s = 1, ...» iV^), from 
their mean x is, in virtue of § 3 (11), 

E = Nx» T^lN, 

a 9 B 

where T is the sura of the numbers. Applying this formula to the 
various sums considered above we have 

( 5 ) 

i i i i 

the grand total T being given by 

i i 

Similarly, summing first for the values in the ith class, we have 
i i i i 

where is the sum of the values in the ith class. Consequently 

2 2 {Xu - = S 2 - 2 (6) 

i i i i i 

Subtracting (6) from (6) we find for the third sum of squares 

= E Tl\^T^IN. (7) 

i i 

• See also Fisher, 1925, 1 and Irwin, 1934, 4. 
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This result may also be obtained directly. For, since is the 
frequency of the value 

E n,{x, - = E - (S n,x,YIN = E T\ln, - 

i i i i 

as stated. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (5), (6) and (7) are unaltered 
by a change of origin. In other words, if all the values are 
decreased (or increased) by the same constant, the values obtained 
for the three sums of squares are unchanged. The arithmetic may 
often be simplified in this way, large numbers being replaced by 
much smaller ones. 

Ex, 35 plots, of approximately equal fertility, wore sown with 7 different 
varieties of wheat, 5 plots to each variety, the distribution of varieties among 
the plots being random. The following table gives the yields of grain in 
bushels per acre, the 7 columns corresponding to the different varieties. Do 
the data (fictitious) indicate a significant difference in the yields of the 
varieties ? 


13 

15 14 

14 

17 

15 

16 

11 

11 10 

10 

15 

9 

12 

10 

13 12 

15 

14 

13 

13 

16 

18 13 

17 

19 

14 

15 

12 

12 11 

10 

12 

10 

11 

The classification 

is according to variety. The number of classes is A = 7, 

and the number of items in each class is n< = 6. Consequently N = 35. The 

arithmetic is simplified by shifting the origin to (say) x 

= 12. Diminishing 

all the yields by 12 we may rewrite the table: 



1 

3 2 

2 

5 

3 

4 

-1 

-1 -2 

-2 

3 

-3 

0 

-2 

1 0 

3 

2 

1 

1 

4 

6 1 

5 

7 

2 

3 

0 

0 -1 

-2 

0 

-2 

-1 

from which we have 






r, = 2. 9. 0. 

6, 17, 1, 

. 7; T 

= 42; 



^< = 0*4, 1-8. 0, 1-2, 

3-4, 0-2, 

► 1*4. 



1 2 xj, = 266. T*IN = 60*4, 2 Tj/n, - 92. 

4 1 < 


Hence 
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The three sums of squares cure therefore, by (6), (6) and (7), 

= 266-60*4 = 215-6, 
i i 

= 266-92 = 174, 

i i 

2ni(5<-5)» = 92-60*4 = 41*6. 
i 

In tabular form: 


Source of 
variation 

D.F. 

Sum of 
squares 

Moan 

square 

F 

Between varieties 

6 

41*6 

6 933 

1*1 

Within varieties 

28 

174 

6 214 

— 

Total 

34 

2150 

— 

— 


ForVj = 6 and v, = 28 the value 1*1 of F is not significant. Since the estimates 
of variance between varieties and within varieties are not significantly 
different, the experiment as a whole does not indicate significant variation 
in the yields of varieties. 

100. Two criteria of classification 

Consider next the case in which the N values of the data may 
be classified according to two different criteria, A and B, For 
simplicity suppose that A determines h different classes, and B 
determines k different groups; also that the hk values of the variable 
are such that, in each of the A classes there is one value from each 
group, and in each of the k groups one value from each class. For 
the purpose of calculation the hk values may be arranged in a 
rectangular array of A columns and k rows, the columns corre- 
sponding to classification A and the rows to B. The double suffix 
notation will indicate that belongs to the tth class and the ^*th 
group; and, in the rectangular array, this value occurs in the ith 
column and the jth row, before z denotes the general mean; 
z^ is the mean of the values in the tth class, and x^ the mean of those 
in the jth group. 

The argument leading to (3) is valid here also, the resolution 
expressed by that equation being with respect to the means of 
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columns. We may use (3) again to resolve the sum 2 S 

i i 

this time, however, with respect to the means of rows. In doing so 
we take as a new variable 

each value of the original variable being diminished by the mean of 
the column in which it lies. The values may be arranged in h 
columns and k rows corresponding to those of Now the general 
mean of is zero, since the means of x^j and x^ are each x. Thus 
X = 0. Similarly the mean of the values X^^ in the^'th row is 

~ X^~-‘Xm 

Accordingly, on applying (3) to the quantities but taking row 
means instead of column means, we have 

s s - X)* = S - ^)" + E E 

i i 1 i i 

or its equivalent 

E E = 2 - X)* + S S (x,^ - X, - x^ + X)*. (8) 

i i i i i 

Substituting this value in (3), and remembering that the numbers 
are each equal to k, we have the resolution expressed by 

E E - 5)* = E - s)' + E - 5)* + E E 3^?. (9) 

i j i i t y 

where — Xi — Xj-hx, 

As in the j)receding section, the expected viihie of the first member 
of (9) is (hk— l)cr2. Similarly, on the assum])tion of a homogeneous 
population, the expected values of the first two sums in the second 
member are (h — l)cr^ and (fc — l)(r* respectively. That of the final 
sum is therefore {hk — h — k-\-l)cr'^. Thus the various sums in (9), 
divided by (Afc— 1), (A— 1), (AJ— 1) and (A— 1){A:-~1) respectively, 
give unbiased estimates of tlie population variance based on degrees 
of freedom represented by these divisors. When the population is 
normal the four sums in (9), divided by o'*, arc distributed like 
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with degrees of freedom as stated above; and the mean values of 
the various sums follow from this. The above results ore usually 
tabulated: 


Variation 

D.r. 

Sum of squares 

Mean square 

Between classes 

^-1 

< 

The quotient of * sum 
of squares’ by n.F. 
in each case 

Between groups 

Jb-1 


Error 

(;i-i)(jb-i) 

ZEEi, 

Total 


t ! 

— 


Degrees of freedom are additive, as well as sums of squares. The 
variation corresponding to the factors of classification is repre- 
sented by the first two sums in the second member of (9). The un- 
controlled variation represented by the residual sum 2 S ^ 

i i 

due to a variety of causes, which are grouped under the term 
‘eiTor*. 

To test the hypothesis of homogeneity in the population, we 
compare the two estimates of variance obtained between classes and 
between groups with that obtained from error. If the factor corre- 
sponding to either classification has a significant effect upon the 
value of the variable, this will appear in the corresponding mean 
square. In order to test the variance ratio by means of the F table* 
the population must be assumed normal. If either of the first two 
estimates of variance is significantly different from that obtained 
from error, the hypothesis of homogeneity is discredited. The 
significance of the difference of the means of any two classes, or 
any two groups, may be tested by means of the i table, as in the 
example below. 

Example, An agri cultural experiment was conducted to test the effects 
of change of soil (6 blocks) and variety of wheat (7 different strains) on the 
yield of grain. Each block was divided into seven plots, and the plots of each 
block were assigned at riuidom to the seven varieties. The yields, in bushels 
per acre, are set out in the same rectangular array as in the example of 
§ 99, columns corresponding to varieties and rows to blocks. Discuss the 
significance of the variation of yield with the two factors. 

* The independence of these estimates of variance may be established 
by Cochran’s method. See 1934, 6. 
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With origin at a; = 12 the values of T, the total sum of squares and 

the sum of squares corresponding to varieties are as already found. Show 
similarly that 

T| = 20, -6, 6, 28, -6 

Xf = 2-86, -0-86, 0-86, 4, -0-86 

S 27/7 = 184-6, Z 27/7 - TV35 = 134-2. 
i i 

The tabulation of results is : 


Bourne of 
variation 

D.F. 

Sum of 
squares 

Mean 

square 

F 

Vflriotiea 

6 

416 

6*93 

4-2** 

Blocks 

4 

134 2 

33*55 

20*2»* 

Error 

24 

39 8 

1*66 

— 

Total 

34 

215 6 

— 

— 


The double asterisk indicates that the value of F is significant at the 1 % level 
(a single asterisk denoting significance at the 5 % level). Thus the yields of 
the varieties are significantly different; and the experiment indicates a very 
marked variation in soil fertility from one block to another. 

We may use the t test to examine more closely the difference between the 
mean yields of any two varieties, say the second and third. The difference is 
= 1’8. To test the significance of this we use the astnnate of variance 
obtained from ‘error’, and corresiionding to 24 d.f. This value, 1-66, is the 
estimated variance of the yield of a single plot. The estimated variance of 
the mean of 6 plots is thus 1*66/5; and tliat of the difference of the means of 
two independent sarnpleis of 6 plots each is 1*66 x 2/6 = 0*664. The s.e. of 
the dilTerence of the means of two varieties is therefore 0*815 nearly. The 
value of t for the above two varieties is thus 1*8/0*815 = 2*2. For 24 d.f. 
this is significant at the 5 % level. The least difference, m, that is significant 
is given by m/0-815 = 2-00, so that m = 1*68 nearly. Show also that the 
8.E. of the difference between the mean yields for two blocks is 0*69, and that 
a difference of 1*4 is significant at the same level. 

101. The Latin square. Three criteria of classification 

In the general case of three criteria of classification, corresponding 
to A, A:, p classes respectively, we should require a three-dimensional 
generalization of the rectangular array employed above. In the 
particular case for wliich h = k = p this requirement is obviated by 
an arrangement known as the Latin square. As it is customary to 
U80 the letters J5, (7, ... to distinguish the different classes of one 
of the three classifications, we may conveniently explain the Latin 
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square, of order n, as an arrangement of the n letters A, B, C, 
in the form of a square array, such that each letter occurs once in 
each column and once in each row. Consequently each letter occurs 
n times in a Latin square of order n. The accompanying array is one 
arrangement for a Latin square of order five. 

B D E A G 

C A B E D 

D C A B E 

E B G D A 

A E D C B 

The triple classification and the Latin square arrangement are 

illustrated by the design of the following agricultural experiment. 
The variable is the yield of grain per acre, and the object of the 
experiment is to test the effect on yield due to change of manurial 
treatment, and to variation of soil in each of two perpendicular 
directions. A block of land is divided into plots, arranged in n 
parallel rows in one of the given directions and n parallel columns in 
the perpendicular direction. The n different treatments are dis- 
tributed at random among the n plots of each row, but in such a 
way that no two plots in the same column are given the same treat- 
ment. We thus have a Latin square in which letters correspond to 
treatments, while rows and columns correspond to soil variation in 
the given perpendicular directions. 

The necessary formulae for calculation are obtained by an easy 
extension of the foregoing results. With the same notation as in the 
preceding sections, the resolution expressed by (9) is still valid. The 

final sum V Y, Yh may be separated into components by applying 
i i 

(3) to the variable and resolving, not with respect to the means 
of rows or columns, but with respect to the means of letters. Let 
y denote the general mean of the values 7^^, and Yf the mean of the 
values for the ith letter. Then, since the mean of each of the four 
terms in Yii has the same magnitude a:, we have F » 0. To calculate 
Yf we consider the four terms of YiS separately. The mean value of 
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Xij corresponding to the Zth letter we shall denote by The mean 
of the values z^ for the Ith letter is x, since each column contains the 
Ith letter once only. Similarly the mean of the values Xj for the Zth 
letter is x. The last term of is constant, and we have the result 

Tl = X/ — X — X + X = X| — X. 

Thus, on applying (3) to the values Yi, and the letters, we deduce 
S 2 F)* - n 2 F)^+ 2 2 

i i I i i 

or its equivalent 

S S y ii = « 2 - 2)* + S S )*. 

i i I < i 

Substituting this value in (9), and putting h = k = n, we obtain 
the required resolution 

22(%-«)* = n2(5<-5)* + «S(5y-x)® + ni:(5,-x)* + 222?y, 

if i f I if 

( 10 ) 

where = x^y — Xy — Xy — x, + 2x. 

The expected value of the first member of (10) is (ti*— l)(r*, and 
on the assumption of a homogeneous population, that of each of the 
first three sums on the right is (n—l)or^, as proved in §98. Con- 
sequently 

^ ( 2 2 2^?i) = - 1 ~ 3(n ~ 1 )] cr* = (n - 1 ) (n ~ 2) cr*. 

i i 

Thus each of the first three sums in the second member of (10), 
divided by (n— 1), gives an unbiased estimate of cr* based on 
(n— 1) D.F.; and the final sura, divided by (n--l)(n — 2), gives an 
unbiased estimate based on that number of freedoms. In the case 
of a normal population the various sums in (10), divided by <r^, are 
distributed like with d.f. equal to those of the corresponding 
estimates of cr*. The results may be tabulated as shown below. 
The first three mean squares are compared with that obtained from 
error in the usual manner. The significance of the difference of the 
means of two classes may be tested by the t table as illustrated 
earlier. 
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The various sums of squares may be calculated from the usual 
formulae, putting N = and h = k = n. Thus 

i i i 3 

Similarly. ^ 2 ^ ^ 

i i 

where Tt is the sum of the values for the Zth letter; and so on. By 
subtraction we have therefore 

2 2 =- 2 2^1i - (2 + 2 2 Tf)!n + 2Tyn^. 

i 3 i 3 i j I 

It is usual to find this sum by subtraction after the other sums 
have been calculated. 


Source of 
variation 

D.F, 

Sum of 
squares 

Moan square 

Columns 

n-1 

nL (x< — «)* 


Rows 

n-1 

nr(*y — *)• 

The quotient of ‘sum 
of squares’ by D.F. in 

Treatments 

n-1 


each case 

Error 

1 

7 

1 

< 1 


Total 

n»-l 

i f 

— 


Example. An ajjpricultural experiment was conducted on the Latin square 
plan to test the effect on yield due to change of treatment (6 kinds) and also 
to variation of soil in eaeh of two perpendicular directions. The results are 
set out in the Latin square below (n = 6), in which letters correspond to 
treatments, while rows and columns correspond to the two perpendicular 
directions. Are the effects on yield significant? 


A 

7-4 

D 

8-9 

E 

6-8 

B 

120 

0 

14-3 

C 

11-8 

B 

6-6 

A 

8*7 

E 

7*6 

D 

7-9 

D 

101 

0 

17'9 

B 

90 

A 

8*5 

E 

71 

E 

8-8 

A 

101 

G 

16 7 

D 

IM 

B 

7-4 

B 

11-8 

E 

8-8 

D 

14 3 

0 

18-4 

A 

101 


Shifting the origin to a? = 10 (i.e. reducing each of the above yields by 10) 
the reader may easily verify that 


5r< = -oi. 

2-2, 

3-6, 

7-6, 

-3-2. 


-7fi, 

2-0, 

31, 

13-4; 

li 

1 

qi 

-3-3, 

281, 

2-3, 

-11-9. 
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Consequently T^/N = 10 x 10/26 = 4-00. 

The various sums of squares are given by 


= 285-18-400 = 281-18 

Y.T‘Jn-T^lN = 17 02-4-00= 13-02 
i 

l,Tjln~T^/N = 60-96 -400= 46-96 
i 

y,TJln--T^IN = 194-89-4*00 = 190-89 
I 

Romaindor = 30-32 

The tabulation is: 


(Total) 

(Columns) 

(Rows) 

(Treatments) 

(Error) 
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Source of 
variation 

D.F. 

Sum of 
squares 

Mean 

square 

F 

Columns 

4 

13-02 

3-255 

1-3 

Rows 

4 

46 95 

11-737 

4-6* 

Treatments 

4 

190-89 

47-722 

18-9** 

Error 

12 

30 32 

2-527 

— 

Total 

24 

281-18 

— 

— 


For = 4 and v, = 12 the 6 and I % values of F are 3-26 and 6*41. Thus 
the variation in rows is significant, and that due to treatments is highly 
significant. 

Show that the s.k. of the difference of the means of two cleisses is 1-005, and 
hence that a difference of means less than 2-2 is not significant at the 6 % level. 
Thus the yield due to treatment C is significantly greater than tliat due to 
any other treatment; and the yield due to D is significantly greater than 
that due to E, 

102. Significance of an observed correlation ratio 

We shall next consider some applications of analysis of variancef 
to testing the significance of an observed correlation ratio, coefficient 
or index, and to testing the hnearity of a regression. First suppose 
that a value, rj, of the correlation ratio of y on x, is obtained from a 
random sample of N pairs of values from a bivariate population 
in which y is normally distributed. We wish to test whether this 
value is significant of an association between the two variables in 
the population. Let the N pairs of values of the variables be 
arranged in arrays as in Chapter v, so that the y*s are classified 

f According to the definition pven by Wishart and Sandora (see § 106, 
below) those applications should be classified under * Analysis of Covariance*. 
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according to the corresponding values of x. Then since the subscript 
i indicates the array, and j the position in that array, we have as in (3) 

IsHiVa-y? = 2 + (H) 

i i i i i 

being the mean value of y in the ith array, and the number of 
values in that array. In virtue of § 34 this equation corresponds to 
the identity ^ 1 - 1/*) + NS\ (12) 

SI being the variance of y in the sample. 

On the assumption that there is no association between the two 
variables in the population, the y's in each array may be regarded 
as a random sample from the population of y’s. The various sums 
in (11), divided by the variance o*^ of y in the population, are then 
distributed like with (iV — 1), {N — h) and (A — 1) d.f., h being the 
number of arrays. The problem is therefore the same as in §98. 
On taking expected values of both members of (1 1) we have 

= (i\r-A)a'2+(A-l)(r2. (13) 

The various sums in (11), divided by the corresponding coefficients 
in (13), yield unbiased estimates of cr*. The tabulation is: 


Source of variation 

D.r. 

Sum of squares 

Mean square 

Between arra3r8 
Within arrays 

A -1 

N-h 

NSiv* 

nm-v*) 

Nsr_ti*/(h-i) 

Total 

N-l 

NSl 

— 


To test whether the mean square between arrays is significantlj 
greater than that within arrays we have 




with 




N^h 

A-r 

2^N--h. 


Example* A random sample of 79 pairs, arranged in 7 arrays of y’s, gave 
a correlation ratio of y on x equal to 0 4. Is this value significant? 

Here = 6, = 72, 


F 


016 


y = 2-29*. 


Since the ff % value of F is 2*23, the above value is significant at that level. 
We conclude that there is association between the variables in the population. 
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103. Significance of a regression function 
To test the significance of a regression function is to examine 
whether the data of the sample indicate any degree of association 
of the variables, of the type represented by the regression equation. 
We shall see that, in the case of a linear regression equation, this 
amounts to testing the significance of an observed value of r; while, 
in the case of an equation of curvilinear regression, it is equivalent 
to testing the significance of an observed value of the correlation 
index R, 

Consider first a linear regression equation for a random sample 
of N pairs of values from a bivariate normal population. If is the 
estimate of given by the regression equation, we know by § 27 (31) 

2:s(2/<^-y)'^ = ss(z/.-y-yi)*+En.(y;-y)*. (14) 

t i i i i 

The first sum of squares in the second member is due to deviations 
from the regression function, and the second sum to the regression 
function itself. In virtue of §27 the various sums in (14) are equal 
to the corresponding terms in the identity 

On the assumption of an uncorrelated normal population these sums, 
divided by cr®, are distributed like with iV^ — 1, N — 2 and 1 d.f. 
respectively, as proved in § 82, The expected values of the various 
sums in (14) are therefore the corresponding terms of the identity 

{N-1)(T^ = (i\r-2)(r2 + cr2, 

and each sum thus gives an unbiased estimate of cr^, on division by 
the appropriate number of d.f. The tabulated results are: 


Source of variation 

D.F. 

Sum of squares 

Mean square 

Regression function 

Deviation from the 

1 

i 

Nr^S: 

regression function 

N-2 

t i 


Total 

1 

N-l 

ZZ(yij-y)* 

1 1 

— 


If the mean square due to the regression function is significantly 
greater than that due to deviations from this function, we conclude 
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that there is a real association between the variables, of the type 
indicated by the regression equation. To test the significance we 
have for the variance ratio 

= (i\r~2)r2/(l-r2); 1, = 

The method is thus equivalent to testing the significance of r; and 
it will be noted that the above statistic F is the square of the statistic 
< of § 89 (8). The two tests are therefore equivalent. 

Example, A correlation of 0*6 is obtained from a random sample of 26 
pairs from a normal population. Is this value significant? 

Here r = i, = 26. =1. i/, = 24. 

From the table we see that this value of F is highly significant of correlation 
in the population. 

Obtain the same result by use of the t table. 

In the case of a correlation index R associated with a curved 
regression line, the formulae of §41 show that the above argument 
holds if is replaced by except that the numbers of d.f. must be 
altered. If k is the number of statistics that must be calculated from 
the 8am])le to obtain the regression equation, the number of d.f. 
associated with deviations from the regression function is N — k, 
and the number associated with the regression function itself is 
jfc — 1. For testing the significance of the regression function we have 
therefore 

IM. Test for non-linearity of regression 

Considering the random sample of N pairs of values from a bi- 
variate normal population, let us return to equation (11). In virtue 
of § 36 (18), the final sum in that equation may be further resolved as 

= Sn,(y<-^,)*^-2n,(F;.--y)^ (15) 

i i i 

the various sums being equal to the corresponding terms of the 
identity 


Nfi^Sl - JV(7»-r*) Sl-hNr^Sl 
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The three sums in (15), divided by cr*, are distributed like with 
A — 1, A — 2 and 1 d.f. respectively; and on taking expected values 
of both members of (15) we obtain the identity 

(A~l)(r* = (A-2)(r*+<r>. 

The various sums in (15), divided by the corresponding numbers of 
D.F., give unbiased estimates of cr^. That obtained from the first 
sum on the right is associated with deviations of the means of arrays 
from the line of regression of y on x. On the assumption of linearity 
of regression this sum is due to sampling errors; and the estimate 
of cr* obtained from it should not be significantly greater than that 
derived from the sum of squares within arrays, i.e. from 

i i 

Tabulating as usual we have; 


Source of v'ariation 

D.F. 

Sum of squares 

Mean square 

Linear regression 

1 


Nr^Si 

Deviation of means 
from regression Ime 

h-2 



Within arrays 

N-^h 

2S(y(,-y.)* 

4 / 

N{l-v*)Sy{N-h) 

Total 

N-^l j 

i:S(y.v-y)* 

* 1 

— 


For testing whether the second mean square is significantly greater 
than the third, we have the variance ratio 


F 


N-h 


= h — 2, — N — 


If the value of F is significant, the assumption of linearity of regres- 
sion is discredited. It should be observed that the value of F 
depends not only on the difference but also on N and A. 

Thus a knowledge of is by itself insufficient to decide the 

question of linearity of regression. 


WMt 


*5 
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> Example. A random sample of 200 pairs of values from a bivariate normal 
population, when grouped in 10 arrays of y*8, gave values r = 0*3 and = 0*4. 
Are these results consistent with the assumption of linearity of regression? 
Hei^ N = 200, h = 10, 




007 190 

0-84 8 


1 » 8 . 


This value is just on the 5 % level of significance. The assumption of linearity 
of regression is thus rather discredited. 


Akalysis of Covabiakob 

105. Resolution of the *8um of products’. One criterion of 
dassificatlon 

Analysis of covariance has been described as *the technique of 
testing for homogeneity in problems dealing with two or more 
correlated variables*.* Its main use is to test the significance of 
the difference between the mean values of a variate y in certain 
classes, when these have been corrected for differences in some 
concomitant variable x. The method of doing this will be explained 
shortly; but we must first study the necessary algebraical tools. 

The resolution of the total variation into components, already 
studied in analysis of variance, has its counterpart in analysis of 
covariance ; and the algebra of the two processes runs along parallel 
lines. The total covariation of a bivariate sample, represented by 
the sum of the products of the deviations of the variates from their 
means, may be resolved into components associated with different 
factors; and from these components, and the corresponding com- 
ponents of variation of x and of y, estimates of the coefficient of 
regression (or correlation) in the population are determined. The 
estimate from ^ error* is tested for significance as in §§ 103, 89 or 90; 
and, if this proves to be significant, the other estimates are tested 
by comparison with it. We are thus able to estimate the effects of 
the various factors on the degree of association of the variates. 

Suppose that the data consist of N pairs of corresponding values 
of the two variates, x and y, and that these may be grouped in h 
different classes according to a certain criterion. With the double 

* Wishart and Sanders, 1036, 3, p. 46. 
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suffix notation, is the jth pair of values in the tth class 

(< *» The numbers of pairs in the different classes are not 

necessarily equal. Let n< be the number of pairs in the ith class, so 
that S As before let x, y denote the general means of the 

i 

two variables, and Xp y^ the means of their values in the ith class. 
Then 

S - x<) = 0 = S iVa - Vi)- (10) 

The deviations of x^^ and from their general means are ex- 
pressible as 

On forming their product, and summing over all pairs of values, 
we have 

2 S {Xii - x) (y^i - y) = s n (% - x^) {y^ - y<) + S - x) (y< - y ), 

ii i 1 i 

(17) 

the remaining sums disappearing in virtue of (16). Corresponding 
to this we have also the resolution of the sum of squares of the a;*B, 
expressed by (3), and that of the y’s given by a similar formula. 

Suppose now that the N pairs of values constitute a random 
sample from a homogeneous population, in which the covariance 
is Then, by §61, the expected value of the sum in the first 
member of (17) is (-AT — l)/^ii. Similarly, since the values in the ith 
class may be regarded as a random sample of pairs, 

-^(2 (% - X^) (Vif - y<)) = (n< - 1 ) fia, 

SO that 

S {*«-«<) (y«-y<)) - S K - “ {N-h)ni^. 

i i i 

Consequently, since the expectations of the two members of (17) 
must be equal, that of the final sum in the equation must be 
(A — 1 ) fill. Thus the various sums in ( 1 7 ), divided by .AT — 1 , — A and 

A — 1 respectively, give unbiased estimates of the covariance of the 
population, based on numbers of d.f. represented by these divisors. 
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Following Wishart and Sanders* we may conveniently denote 
the three sums in (17) by (7^, C' and C respectively, the equation 
being then equivalent to C =» C7' + C. The corresponding sums of 
squares for the x’s will be denoted by .4"", A' and .4, and those for the 
y*s by JS' and B. Then (3) and its counterpart are equivalent to 

The resolution of sums of squares and products may be combined 
in a single table as follows: 


Source of 
variation 

D.F. 

Sums of 
squeures 

Sum 
of 
pro- 
i lucts 

— 

Coefficient 

of 

regression 

Coefficient 

of 

correlation 

Bet-ween classes 

Within classes 

/i-1 

1 N-k 

A,B 

A', B' 

O 

C' 

b=^CIA 

b'^C'IA' 


Total 

N-l 

B" 

C' 

— 

— 


We thus obtain different estimates of the coefl&cients of regression 
and correlation. By means of §§ 103, 89 or 90 we first test whether 
the estimate obtained within classes is significant of correlation in 
the population. If it proves to be significant, we proceed to test the 
differences between the class means after these have been corrected 
for regression. Incidentally we may also test the significance of the 
difference between b and b\ 

106. Calculation of the sums of products 
The sum of products of the deviations of N pairs of numbers 
y, (« *= 1, •••fN), from their means x, y is given by 

(18) 

# 

where T^'^x, = Nx, T' Xy$.'^ Ny> 

Applying this formula to the sun^ in (17) we have 

O' - - ^I.Xtyyt,-TT'JN, (19) 

• Wishnrt and Sandera, 1936, 8, p. 48. 
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r, T' being the grand totals for x and y respectively. Similarly, on 
summing first for the values in the tth class, we have 


where 


^ S “ ^i) (Vij "" 2/i) — 2(2 ■“ 

j; = S % = y; = S Vii = n^Viy 


these being the sums of the values of x and y in the ith class. Con 
sequently 


i i i 

Subtraction of (20) from (19) gives the remaining sum 


( 20 ) 


( 21 ) 

i 

which may, of course, be obtained independently. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (19), (20) and (21) are 
unaltered by a change of the origins of x and y. In other words, if 
all the values x^j are decreased (or increased) by the same constant, 
and all the values y^^ by another constant, the results obtained for 
the three sums of products are unchanged. The arithmetic may often 
be simplified in this manner, large numbers being replaced by much 
smaller ones. 


107. Examination and elimination of the effect of regression 
The method of § 103, for testing the significance of an estimated 
regression, consists in resolving the total sum of squares of the y’s 
into two components, one due to regression and the other to devia- 
tions from the regression line, and then comparing the estimates 
of population variance derived from these two sums. It will be 
observed that the test was applied to a sample without any attempt 
at classification, that is to say, quite apart from any resolution of 
the variation into components due to factors other than regression. 
In terms of the total sums of squares and products, 5"^, (7^, 
the sum of squares due to regression is expressible as 

^fSlr^ « « C^^IA% (22) 

and that due to deviation from the regression line is B" 

The former corresponds to 1 d.f. and the latter \o N — 2^ which is 
one less than for the total sum of squares. 
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Now we have see \ how, after the values in the sample have been 
classified, we may resolve the total variation of the variables, and 
their covariation, into components between the means of classes 
and within classes respectively. From each set of components we 
may calculate an estimate of the regression coefficient and a line 
of regression; jstnd the corresponding variation of the y’s may be 
further resolved by the method of § 103 into a part due to regression, 
and another due to deviation from the regression hne, the number 
of D.F. for the latter being one less than the total d.f. for that com- 
ponent. Applying this process to each hne of the table at the end 
of § 105, we find from each a sum of squares of deviations from the 
corresponding line of regression, with d.f. as indicated in the 
following table:* 


Source of 
variation 

D . V . 

Residual sum 
of squares 

Between classee 

Within classes 

/^-2 


Total 

N-2 



leading to different estimates of the variance of y in the population. 
Denoting the estimate within classes by we have 


* “ isr-A-i ■ 


(23) 


This is the estimate of the variance of y after correction for regres- 
sion. The significance of any other estimate of the variance is tested 
by comparison with it. 

The significance of any apparent regression is first tested by the 
method of § 103, applied to the sums within classes; that is to say, 
we compare the estimate C'^A' of variance, due to regression and 
based on 1 d.f., with the above estimate based on N — A — 1 d.f. 
If the regres^on proves to be significant we proceed to test the 
differences between class means after correction for regression. 
Subtracting the second row of the above table from the third we 
have the sum of squares B+C'^/A* with A — 1 d.f., which 

we also compare with the sum of squares within classes. If it proves 

* Cf. Wishart and Sanders, 1936, 3, p. 49. 
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to be significant we conclude that there are differences between the 
class means, after these have been adjusted by the regression 
coefficient 6' within classes. 

Now in the first line of the above table we have a sum of squares 
between class means, adjusted by the regression coefficient 6, and 
corresponding to (A — 2) d.f. The testing of this sum in place of the 
above raises the question of the significance of the difference between 
b and 6'. This may be examined as follows. Tabulating the two sums 
of squares just mentioned, and their d.f., we may arrange the work: 


Between classes 

D.F. 

Sum of squares 

Adjusted 

A-1 

B + C^IA'-C^IA' 

Adjusted by 6 

A-2 

B-C*IA 

Difference 

1 

C*IA+0’*IA'-C’*IA’ 


Now the sum in the last row may be expressed 

((7-fC")*_ AA^ (G C'Y _AA^{b-b')^ 

A'^ A' A+A' ~A+A'\A A'j ~ A+A’ ' 

Comparing this estimate of variance, based on 1 d.f., with the 
estimate based on — A — 1, we have the variance ratio 

^ AA'(b^b')^ i7-A-l 

A+A' ^B'-C'^IA" 

Fi = 1, V^^N — h—\. 

A rare value of F indicates that the two coefficients, b and b\ are 
significantly different; in which case the factor of classification h£U3 
an effect upon the degree of association of the variables. 

Example. To examine the relation between the yield of grain (x bushels/ 
aero) and the cost of production {y shillings/bushel) in a certain state, six 
districts were chosen at random, and in each a random selection of five farms 
was made. The results for the season are tabulated below, columns corre- 
sponding to districts. Is there any significant indication of correlation 
between tlie variables; and, if so, does its value vary with the district? 


X 

y 

X 

y 

X 

y 

X 

y 

X 

y 

X 

y 

13 

3-6 

10 

4*7 

16 

2-8 

9 

4-6 

12 

3-8 

16 

4-7 

11 

40 

9 

6-3 

13 

30 

12 

3*6 

17 

2-2 

11 

61 

18 

2-6 

8 

6-6 

15 

2-8 

10 

4*0 

19 

20 

9 

60 

14 

3-7 

12 

40 

10 

4-2 

13 

3*4 

11 

4-6 

14 

4*3 

12 

4-3 

11 

4.4 

12 

3-2 

8 

6-6 

14 

3-4 

17 

3-9 
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Shifting the origin to a; = 12, y = 3, and rewriting the table, the student 
will easily verify that 

T<=:8, -10. 6, -8, 13, 0; T = 15, 

T; = 3, 0, 1. 6. 1, 9; T' = 29, 

and hence that 

T^IN = 7-6, T'^IN = 28*03, TT'/N = 14*6, S ifjh = - 8*2, 

i 

2 S = 269, 2 2 yj = 66*96, 2 2 a;</y« = — 64*8. 

i i i i i i 

Consequently 

A = 93*8 - 7*6 = 86*3, A' = 259 - 93*8 = 166*2, 

A'' = 269-7*6 = 251*6. 

B = 41*8-28*03 =; 13*77, B' = 66*96-41*8 = 16*16, 

B* = 66*9-28*03 = 28*93. 

O = - 8*2 - 14*5 = - 22*7, C' = - 64*8 + 8*2 = - 46*6, 

= - 64*8 - 14*6 = - 69*3. 


The tabulated results are: 


Source of 


Sum of squares 

Sums of 

Ck>efficient 

of 

regression 

variation 


(*•) 

(</’) 

products 

Between districts 

6 

86*3 

13*77 

-22*7 

-0*263 

Within districts 

24 

165-2 

16*16 

-46*6 

-0*282 

Total 

29 

261-6 

28*93 

-69*3 

-0*276 


First, to test the significance of 6' we have, as explained above. 




C'*IA' 

B'-C'^IA' 


x(N-i^-l) 


13*14x23 

2*02 


= 149** 


^1 == 1» >'i = 23. 


Thus the value of b' is highly significant of negative correlation. 

To test the differences of class means, after these have been corrected for 
regression, we calculate the estimate of veuriance 

B + C'^IA'-C^^IA" 13*77 + 13*14-19*09 , 

1 . 66 , 


and compare this with the estimate within classes 

.* = 2-02/23 = 0-088. 
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108] 

The variance ratio is 

F = 1*56/0088 = 17-7** 

which, for = 5 and v, = 23, ia highly significant. We conclude that the 
cost of production, corrected for differences in yield, varies significcuitly from 
district to district. 

By using (24) show that b and 5' do not differ significantly. 

108. Two criteria of classification 

Suppose next that the N pairs of values in the sample 

may be classified according to two different criteria, H and K, the 
former determining h different classes and the latter k different 
groups. Suppose for simplicity that one pair of values from each 
group is present in each class, and vice versa, so that N = hk. The 
pairs of values may be arranged in a rectangular array of h columns 
and k rows, in which columns correspond to classification H and 
rows to K. The pair belongs to the tth class and the jth 

group, and appears in the tth column and the jth row. 

The resolution of the sum of products expressed in (17) is still 
valid. But the first sum in the second member of the equation may 
be further resolved. Thus if 

~ ^ii ^ii Vii “ 

and we apply the resolution expressed by (17) to the sum of pro- 
ducts making it however with respect to the means of 

rows instead of the means of columns, we find by the same argmnent 
as in § 100 that 

11 1 1 

+ S S (% -Xf-Xf + x) (ytf -Vt-yf + y)- (26) 

1 1 

Denoting the various sums in this equation by C, C^, C7„ C we may 
write it briefly 

On the assumption that the pairs of values constitute a random 
sample from a homogeneous population, the expected value of the 
first member of (25) is {N — l)/^iif where is covariance in the 
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population. That of the first sum in the second member has been 
shown to be (A— and similarly tha^ of the second sum is 
(i— Consequently the expectation of the final sum is 

- A - fc -f 1 ) = (A - 1 ) ( A - 1 ) 

On dividing the various sums by (iV'— 1), (A— 1), (A— 1) and 
(A — 1) (A — 1) respectively, we thus obtain four unbiased estimates 
of the covariance of the homogeneous population, with d.f. repre- 
sented by these divisors. In § 100 we obtained the corresponding 
estimates of the variance of and similar equations give estimates 
of the variance of y in the population. From the partial sums we 
obtain three indey)endent estimates of the coefficients of regression 
and correlation. Denoting the sums of squares of the x*s by A*', A 
A' and those of the y'a by Z?"', B\ we may tabulate 

the results: 


Source of 
variation 

D.F. 

Sums of 
squares 

Sura of 
pro- 
ducts 

CoeflRcient of 
regression 

(x*) 

(y*) 

Between classes 

A-l 



c, 


Between groups 

Jfc-l 




bt = C,IAt 

Error 


A' 

D' 

C' 

h’ = C'/A' 

Total 




C" 

— 


The first step is to test the significance of the regression coefficient 
b* obtained from the error sums; and this is done exactly as in § 107, 
except that the number of d.f. for error is here (A — 1) (A — 1). Thus 
the residual sum of squares due to error is B' — C'^jA', with N-h’-k 
D.F., giving the estimate of variance of y 

a* = (5'-C'2/^')/(iV^-A-A). 

With this we compare the estimate C'^JA' obtained from the regres- 
sion sum of squares corresponding to 1 d.f., and the result settles 
whether 6' is significant. If it is, we proceed to test the significance of 
the differences of the means of y in classes (or groups) after these have 
been corrected for regression on z. This may be done as follows. Take 
the first and third rows of the above table, and by addition form a 
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combined source of variation, 8 = (classes + error), with sums of 
squares and products -4, jB, C. Thus we have: 


Source of 
variation 

D.F. 

Sums of squares 
and products 


Classes 

A-1 



Error 


A\ B\ C' 


Total, S 

N-k 1 

\ A, D, 0 


where A = A i-hA\ etc. From these we form the corresponding 
residual sums of squares as in § 107, viz. 


Source of 
variation 

D.F. 

Residual sum 
of squares 


Classes 

1 

1 

1 

Bi-cyAi 


Error 

N-h-k 

1 B’-C’*IA' 


S 


1 B-C*IA 


Subtracting the second row from the third we obtain a residual 
sum of squares Bi-h C'^IA' — C^jA corresponding to A— 1 d.f. 
Comparing the estimate of variance obtained from this with the 
estimate we are able to decide whether the class means differ 
significantly after correction for regression. 

Incidentally, we may deduce a test for the significance of the 
difference — 6'. For, on subtracting the residual sums of squares 
for classes and error from that for 8, we have the sum 


C\IA^^G'^IA^-C^IA 

with 1 D.F. As in § 107 this sum is expressible as 

so that in testing it we are testing the difference — b\ Comparing 
it with the residual sum of squares for error we have a variance ratio 


^ N-h-^k 

^^~~AT+A~ B' 


(26) 


■G'^IA’' 

Vi^l, Pf=^N—h—k. 

We thus determine whether 6| is significantly dififerent from 6'; 
and we may test 6, by a similar formula.* 


For a numerical illustration see Fx. xi, 15, below. 
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EXAMPLES XI 

1. Verify 843 follows that the expected value of the final sum in 
§ 97 (3) is (A — 1) (7*. If is the mean of the population, 

i i 

Since E(x ^ — /£)* is the variance of the mean of a sample of members, 
and E{x — /i)^ is that of the mean of a sample of N members, it follows 
that P* —I 

— = 2n^{<r*/nJ — jVcrViV =* (A— l)or*. 

Verify the mean value of S S (% “ similarly. 

i i 

2. With the notation of §100, = x^^ — x^ — x^ + x^ show that 

2 = 0 = S ^n- Thus the sum of the Y’s in any row or in any 

i i 

column is zero. Only (A— 1)(A— 1) of the quantities are in- 
dependent. 

3. To test the significance of the variation of the retail price of a 
certain commodity among four large cities, A, £, 0 and D, seven 
shops were chosen at random in each city, and the prices observed 
were as follows: 

A: 6s. lOd., 6s. 7d., 6s. Id., 5s. 9d., 5s. 9d., 5s. 3d., 5s. Id. 

B: 7s., 6s. lOd., 6s. 8d., 6s. 7d., 6s. 4d., 5s. Sd., 5s. 2d. 

C: 7s. 4d., 7s., 6s. Sd., 5s. Sd., 5s. Sd., 5s. 6d., 5s. 6d. 

D: 6s. 7d., 6s. 5d., 6s. 4d., 6s. 2d., 6s., 5s. Sd., 5s. 4d. 

Do the data indicate that the prices for the four cities are signi- 
ficantly difierent? 




Thus the mean square between cities is less than that within cities, 
and is therefore not significantly large. Neither is it significantly 
small because, for = 24 and ==* 3, a value of in the neighbour- 
hood of 2 is not significant. 

Show that the s.e. of the difference of the means for two cities is 
4- 1 5d. ; and hence for no two cities is the diflFerence of the mean prices 
significant. 

4. Show that, if and are the means of two classes of nj and 
nj members respectively, and 5® is the estimate of population variance 
derived from error and corresponding to i/ d.f., then s 7( 
is the S.E. of the difference of the means of the two classes, and 
hence that the statistic . 

«V(V«i+ 1/"*) 

is distributed like t for d .f. This formula provides a method of testing 
the significance of the difference of the means of two unequal classes. 


5. In a certain large country, to test the variation in the price 
of a certain commodity with district and season, six districts were 
chosen at random, and the price was observed in each on six random 
occasions throughout the year. The observations are recordedin pence 
in the table below, in which columns correspond to seasons and rows 
to districts. Discuss the significance of the variation with season and 


with district. 


11 

15 

24 

19 

21 

17 

12 

10 

25 

18 

22 

16 

10 

13 

21 

17 

20 

16 

0 

14 

18 

18 

21 

13 

13 

16 

20 

16 

23 

14 

14 

17 

23 

20 

22 

15 
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Show that analysis of variance leads to the following results: 


Source of 
variation 

p.v. 

Sum of 
squares 

Mean square 

F 

Seasons 

6 

492-33 

08-46 

65-r)** 

Districts 

6 

44-33 

8-86 

5-0** 

Error 

25 

4433 

1-77 

— 

Total 

35 


— 

— 


Both the variance ratios are highly significant. Show that the s.B. 
of the difference of the means for two seasons or two districts is 
0*77 ; and deduce that the prices in the first, second and last districts 
do not differ significantly. 

6. An agricultural experiment, on the Latin square plan, gave 
the following results for the yield of wheat per acre, letters corre- 
sponding to varieties, columns to treatments and rows to blocks. 
Discuss the variation of yield with each of the factors: 


A 

16 

B 

10 

C 

11 

D 

9 

E 

9 

E 

10 

C 

9 

A 

14 

B 

12 

D 

11 

B 

15 

D 

8 

E 

8 

C 

10 

A 

18 

D 

12 

E 

6 

B 

13 

A 

13 

C 

12 

C 

13 

A 

11 

D 

10 

E 

7 

B 

14 


Show that the results obtained by analysis of variance are: 


Source of 
variation 

D.7. 

Sum of 
squares 

Mean square 

F 

Treatments 

4 

66-56 

16-64 

37-8** 

Blocks 

4 

2-16 

0-54 

1-2 

Varieties 

4 

122-56 


69-6** 

Error 

12 

5-28 

0-44 

— 

Total 

24 

106-56 

— 

— 


Show also that the s.B, of the difference of the means of two 
classes is 0*42; and hence that for no two blocks do the means differ 
significantly. 
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7. A random sample of 100 pairs of values from a normal popula- 
tion, grouped in 10 arrays of y’s, gave a correlation ratio ofy onx 
equal to 0*3. Show that this value is not significant of association 
between the variables. 

8. A random sample of 160 pairs from a bivariate normal popula- 
tion when grouped in 16 arrays of y’s gave values r *= 0*4 and 
Tjy =s 0*6. Show that these results are consistent with the assump- 
tion of linearity of regression of y on x. 

9. A parabola, fitted to a random sample of 46 pairs of values 
from a normal population, gave an index of correlation 2J = 0*3. 
Show that this value is not significant of parabolic regression. Also 
show that it would not be significant for a sample of less than 
83 pairs. 

10. Give a direct proof that the expected value of the last sum 

in§105(17)is(A-l)/tu. 

11. Give a direct proof of the formula § 106 (21) for (7. 

12. In the example of § 107, show that the sum of squares repre- 
sented by — C*/ A, which corresponds to 4 d.f., is highly significant. 
Hence the deviations of the means of districts from the line of 
regression fitted to them are too great to be attributed to chance. 

13. Supply the details of the proof of § 108 (26). 

14. With the notation of § 107 the difference of the means of the 
y*8 for the pth and gth classes, when corrected for regression, is 

yp-yg-b'{xj,-x,) (i) 

which consists of two independent parts. The estimated variance of 
the first part is 2s*/jk, where k is the number in each class, and 
8* is the error mean square (B' — C'^/A')I(N — h— 1) in the analysis 

of residual variance. The estimated variance of 6' is Hence 

show that the estimated variance of the difference (i) is 

8\2lk+ix^-x,)*IA'l (u) 

The quotient of the difference (i) by the square root of the expression 
(ii) is distributed like t with N — h—l d . f . (Cf. Wishart, 1936, 2; 
also Wishart and Sanders, 1936, 3, pp. 63-4.) 
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15 . Examine, as in § 108, the covariation between the yields of 
grain and straw, as indicated by the following data.* Each of 6 
blocks was divided into 5 plots, and 5 different treatments (.4 , J5, . . . , 
E) were distributed at random among the plots of each block. The 
columns correspond to blocks, and the yields of grain and straw are 
denoted by x and y respectively: 



X 

y 

X 

y 

X 

y 

X 

y 

X 

y 

A: 

66 

32 

68 

26 

65 

32 

71 

26 

62 

33 

B: 

75 

38 

64 

20 

71 

32 

71 

28 

64 

29 

C: 

72 

33 

69 

30 

69 

38 

69 

30 

61 

30 

D: 

70 

29 

69 

27 

72 

26 

70 

24 

70 

31 

E: 

70 

37 

67 

29 

62 

33 

66 

23 

66 

40 


The factors of classification are blocks and treatments. Diminish- 
ing all the x*a by 66 and all the y’s by 26, verify that 

j;=22, -3, -1, 17, -7; T = 28 

T; = 39, 2, 31, 1, 33; T' = 106, 

and calculate the values of 7} and T'-. Hence show that the sums of 
squares and products are given by : 


Source of 
variation 

D.F. 

Sum of 

(I*) 

squares 

Or) 

Sum of 
products 

Coefficient 
of regression 

Blocks 

4 

135 04 

265-76 

2-6S 

0-020 

Treatments 

4 

98-24 

87-36 

-64-12 

-0-653 

Error 

16 

455-36 

211-44 

174-72 

0-384 

Total 

24 

688-64 

504-56 

113-28 

— 


The estimate obtained from the residual sum of squares for 

error 18 144-34 ^ 

= -TF — 7-hr- = — ,-:r— = 9-C2. 


N-h-k 


15 


To test the significance of 6' we compare with s® the estimate 
C'^JA' = 67-1 corresponding to 1 d.f. The variance ratio is 

F = 67*l/9-62 = 6*9 = 1, >^2 = 15). 

* Data adapted from Fisher and Eden, Joum. Agric, Sci., vol. 17 (1927), 
p. 548. 
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Thus the value of b' is decidedly significant of positive oorrelation 
between x and y, apart from the factors of classification. To test 
whether the yield of straw, corrected for yield of grain, varies 
significantly with the treatment, we calculate 

c = Ci+C" = 110 - 6 , A = « 653-6 

and compare the estimate of variance — 1) 

based on 4 d.f. with 8^ based on 15. The former has the value 

(87-36 + 67-1 -22-l)/4 = 33-09, 

and the latter is 9-62. The variance ratio is 

F « 33-09/9-62 = 3*44 

which, for = 4 and significant at the 5 % level. We 

conclude that the yield of straw, after correction for yield of grain, 
does vary significantly with the treatment. 

Show that the difference 6' — 6g, tested by § 108 (26), is decidedly 
significant. 


WMS 


16 



CHAPTER Xn 


MULTIVARIATE DISTRIBUTIONS. 
PARTIAL AND MULTIPLE CORRELATIONS 

109. Introductory. Yule*8 notation 

In considering a bivariate distribution we saw that, when it is 
known that the values of one variable are influenced by those of 
another, the coefficient of correlation provides a useful measure of 
the degree of association between them. But it often happens that 
the values of a variable are influenced by those of several others. 
It is known, for instance, that the statures of men are influenced 
by those of their ancestors; and the yield of grain is affected by the 
amounts of different fertilizers used. In such cases our data usually 
constitute a distribution of values of several variables. If we are 
concerned with the combined influence of a group of variables upon 
a variable not included in that group, our study is that of multiple 
regression and multiple correlation. If, however, we wish to examine 
the effect of one variable upon a second, after eliminating the effects 
of other variables, our problem is that of partial correlation. The 
analysis involved is rendered simple and compact by a notation due 
to Yule,* which has gained a fairly wide acceptance in recent years. 
Before applying this to the more general case of several variables, 
we shall pave the way for the student by reviewing the results 
obtained for two variables in Chapter iv, and expressing them in 
Yule’s notation. 

Let the two variables, measured from their means , be and x.^, 
with standard deviations or^ and If the lines of regression of 
on ^ 2 , and of x^ on are 

a?! = ~ (^) 

respectively, the residuals, or errors of estimate of the variables 
incurred by using these equations, are expressed by 

*1.1 ~ *2.1 *2~^21*l* 

• Yule, 19(J7, 1. 


( 2 ) 
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These residuals are the deviations of the representative points from 
the corresponding hne of regression. The values of the coeflScients 
of regression, 6^2 ^21 > obtained by minimizing the sums of 

squares of the residuals; and tliis is done by equating to zero the 
partial derivatives of these sums with respect to 612 and 621, and the 
constant terms. This leads to the normal equations 

~ S ^2(^1 ""^12^2) ~ ^ (^) 

in the one case, and 

2 {^2 - ^21^1) = 2 ^1(^2 ~ *21^1) = ^ (3') 

in the other. The first equation in each case expresses the fact that 
the mean of the distribution is on the line of regression; and, with 
the mean as origin, the equations of these lines have the simple 
form (1). The above normal equations, expressed in the more 
compact notation, are 

2X1., = 0, Sa:aXi.j = 0 (3) 

and = = (3') 

respectively, the summation including all paiA of values of the 
distribution. It will be observed that, whereas in Chapter rv 
subscripts were used to distinguish the pairs of values in the distribu- 
tion, they are here used to distinguish the different variables. They 
are no longer needed for the former purpose, since S will throughout 
denote summation over the whole distribution. 

From (3) and (3') we have the familiar values of the coefficients 
of regression, 

6^2 ~ ^21 

The coefficient of correlation, which we denote by ^ 

such that ^ 

^12 ~ ^ 12 ^ 21 > (^) 

and 6 i 2 “ ^ai ^ai^a/^i* (®) 

ri2 having the same sign as or which is the sign of S 
should be remembered that, although == fai* values of bi^ 
and 621 ^ general different. Lastly the mean squares of the 

x6-a 
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deviations (2) are denoted by crj ^ and crl.i respectively. In virtue 
of § 26 (21) and (23), these are given by 


^1.1 S^i.i ■* ^i(l ""r}2), 
^.1 ^ 2 = o"i(l — r|i). 


( 6 ) 


The quantities (r^ , and ^ ®'re referred to as standard deviations of 
the first order, the order being the number of subscripts following the 
point. These are called secondary 8id)8cripiSy while primary subscripts 
are those preceding the point. The standard deviations and o*, are of 
zero order. The residuals Xi ^ ^'^d ^ are deviations of the first order. 

The regression of x^ on X2 is said to be linear ^ if the mean of each 
array of XiS is on the line of regression. 


110. Distribution of three or more variables 

Consider next a distribution of three variables which, measured 
from their means, are Xj, x^ and x^* The data consist of N sets of 
corresponding values of the three variables. We enquire first as to 
the best estimat^of x^ that can be obtained from the data in the 
form of a linear function of x^ and x^^ The regression equation is 
thus of the form , 

® + ^ 12 . 3^1 + ^ 18 . 

The ‘best' estimate is interpreted in accordance with the principle 
of least squares, the constants being chosen so as to make the sum 
of the squares of the errors of estimate a minimum . The subscripts 
to the coefficients are written down on the following principle. The 
first is that of the variable for which the estimate is being found, 
and the second is that of the variable which the coefficient multiplies. 
These are primary subscripts. Separated from Hhem by a point are 
the subscripts of the other variables that enter into the equation. 
These are secondary subscripts, and their number determines the 
order of the regression coefficient. In the above equation the 
coefficients are partial regression coefficients of the first order. The 
significance of the term ‘partial' will be explained shortly. The 
constant a has been given no subscripts, because it will appear 
immediately that its value is zero. 
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The sum of the squares of the residuals to be minimized is 
S (2?! — a — ““ 

Equating to zero its partial derivatives with respeot a and the 
6*8 we have the normal eqiuitions 

% 

— 0 — 622,8*2 ■” ^18.2 ^a) “ 

S*2(*1“"®““622.8*2“"628.2 *8) ~ 

2*8(*1*”® ~622.3i*?2“-628 2*3) “ 0. 

Since the mean of each variable is zero, the first of these equations 
gives a = 0; and the regression equation of on and x^ is simply 

*1 ~ 622.8*2 628. 2*8* (^) 

The other two normal equations may then be written more con- 
cisely SxjXj 23 = 0 = (8) 

in which a;2.28> defined by 

*1.28 ~ *l“"622.8a;2-~6i8 2*8, (9) 

is the residual, or error of estimate of Xi from the regression equation 
(7). Its mean is zero, since those of the other variables are zero. 
Hence the s.d. of this residual, denoted by is given by 

-^^1.28 ~ 2*1.28> 


the summation covering the whole distribution, whose total 
frequency is N, This is a s.d. of order two, since it has two secondary 
subscripts. 


Similarly, we have the regression equation of a:, 

on Xi and x„ 

namely, . . 

*2 ^21.8*1^ ^ 23 . 1 * 8 » 

(11) 

for which the error of estimate is 


* 2.18 == *2 “"632.3*2 — 623. 2*3, 

(12) 

and the normal equations 


S*1**.U = 0 = 

(13) 


There are also similar equations for the regression of ;r8 on x^ and 
In general the coefiScients 622.3 632.3 are different. 
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The normal equations (3), (8) and (13) express' that the sum of the 
prodvjcta of corresponding values of a variable and a residual is zero^ 
when the subscript of the variable is included among the secondary 
subscripts of the residual^ the summation covering the whole dis- 
tribution. Further, if ^ ^ residual defined by (2), we have 

S ^1.23^1.1 “ S^l.23^1* 

Similarly, 

S ^1.23^1.23 S^1.23(^l“" ^12.8^2 “”^ 13 . 2 ^ 3 ) “ S ^1.23^1 

in virtue of (8). Thus it is evident that the sum of the products of two 
residuals is unaltered by omitting from one of the factors any secondary 
subscripts which are common to both. In virtue of this result and the 
normal equations it follows that the sum of the products of two residuals 
is zero if all the subscripts of the one are included among the secondary 
subscripts of the other. 

The argument of this section is also applicable to the case of n 
variables ...,x^. In the regression equation of x^ on . . . , 

which corresponds to (7), the coefficients have each n — 2 secondary 
subscripts, and are therefore of that order. The residual 23 ...n 
corresponding to (9) is of order n~l, as is also its s.d. u'i. 23 .^n* 
The normal equations are 

Sa:^a^i. 28 ...n = ^ (* = 2,3,...,n). 

And the theorem concerning the omission of common secondary 
subscripts is equally valid in the case of n variables. 

111. Determination of the coefficients of regression 

The regression equation (7) may be interpreted as representing a 
plane, called the plane of regression of x^ on x^ and x^. The normal 
equations (8) may be expressed 

^ 1 ^ 2^12 “^ 12 . 8^2 ”'^ 18 . 1 ^ 8 ^ 2^32 ~ 

^1<^8^18 ""^12,8^2^8^28 “"^13.2^8 =* 

where r^^ is the correlation between a?! and x^, obtained by ignoring 
the values of x^. This is called the total correlation of x^ and Xg. 
Similarly, is the total correlation of Xg and Xg, and so on. These 
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coefficients are symmetrical in the subscripts. We may eliminate 
the 6*8 between (7) and (14) by equating to zero the determinant of 
their coefficients and the remaining terms. Dividing the second 
and third rows of this determinant by and 0*3 respectively, and 
then the first, second and third columns by cr^y and cr^ respec- 
tively, we obtain the regression equation of on the other variables 
in the form 







O'! 

<^2 

<^8 




1 

^32 

= 0. 

(16) 

^13 

^23 

1 




If then 0 ) is used to denote the determinant 

f ^21 ^31 

^=^12 1 ^82 ( 1 ^) 

^13 ^23 ^ 

and is the cofactor of the element in the ith column and the 
^*th row, we may write (15) as 




From this form of the regression equation it follows that 




The regression of on Xg and Xj is said to be linear if the mean of 
each array of x^’s lies on the plane of regression (17). 

The residual x^ 23 niay be looked upon as the deviation of the 
representative point from the plane of regression. The term deviation 
is therefore often used in place of residual. The variance crj ,3 of 
Xj 23 may be expressed in terras of (Oy and For 


-^^1.23 — S^1.2S — 2 ^1(^1 3^2 ”'^18.2^3)* 


This is equivalent to 


(orf ~<rf 23) — 6i*.a^l^2^21 ““^ 13 . 2 ^ 1 ^ 3^81 ■“ 
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and, on eliminating the 6*8 between this equation and (14), we have 
a result whioh may be expressed 


/ 7*2 

1 ^ 1.23 

fti 

^81 

ru 

1 

^32 

fit 


1 


or =* 

Consequently 

The above argument clearly holds for the more general case of 
n variables. In place of (16) we then have a determinant a> of order 
n; and the regression equation of Xi on ^ 2 , . is 

+ (20) 

(7i 0-2 

From this we obtain, as in (18), the regression coefficients of order 
n — 2. In place of (19) we have an equation giving the variance of 
the deviation a?!,*, namely, 

( 21 ) 

Example. Let Xi, x^, x^ in. be the excesses of the heights of father, mother 
and son respectively above their mean values. A distribution of these variables 
gave the following approximate correlations emd standard deviations:* 

ss 0*28, Tjj 2— 0*49, ~ 0*61, 

OTj = 2*7, cr, = 2*4, = 2*7. 

Show that the regression equation of x^ on x^ and x^ is 

a?, 2= 0*40x1 + 0*42x2, 

and deduce that, if the mean heights of father, mother and son are 67*68, 
62*48 and 68*66 in. respectively, the regression equation for actual heights 
JSTi, Xf, IS 

X, = 16*3 + 0*40Xi + 0*42X,. 

Also show that cTg ^ s 2*1. 

* Modified data from Pearson and Lee, Biome^kOf voi. 2 (1003), pp- 
367-462. 
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112. Multiple correlation 

We proceed to find the correlation between the variable and 
its estimate from the regression equation (7). This correlation, 
which is an indication of the agreement between and its estimate, 
is called the coefficient of multiple correlation between and the two " 
variables Xg and Xg, and is denoted* by /ii( 28 )- It may be determined 
as follows. If 23 is the estimate of x^ given by (7), we have 

~ ^ 12 . 8^2 + ^ 18 . 2*8 

or ^ 1.28 “ * 1 "“* 1 . 28 * 

The mean value of ga zero, since those of and x^ js «*© 
zero. The sum of the products of x^ and i® 


2*1^1.23 *l(*l“*1.23) ■“ S*1““S*1.28 

= N{(T\-cr\^^), 

Also S® 1.23 = Ij(* 1 ~* 1 . 23 )* ~ S^l^S^l.aa 

s= N(o'\ — (T\2z)* 


Consequently the coefficient of correlation between x^ and ja i® 
given by 


R. 


— ^l~‘^ l»28 

0*1 ^ {a\ — erf 23) 



). 


This is the required coefficient of multiple correlation. The .result 
may be expressed 



^1.23 ^ 1(1 "" Rl(23))y 

(23) 

which is analogous to (6) and to §41 (46). Comparing (23) and (19) 

we see that 

1 1^1(23) = W^ll> 

(24) 

whence 

^(28)* 1 a* l“"W(^""^i8) 



^12 + ^13 ~ 2^12^23^81 

** 1 — r* ’ 

^ ^28 

(26) 


which expresses the multiple correlation in terms of the total 
correlations between the pairs of variables. 


• Some writers 


prefei^the notation Ri . 
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We may note that i?i( 23 ) is never negative. For, from the above 
argument, 23 = 2^i.23> which cannot be negative. Further, 

when /2 i( 23) = 1 it follows from (23) that of 23 = b, which requires 
all the deviations 23 be zero, so that is given accurately by 
tjie regression equation. In this case is a linear function of x^ 
and x^. 

The argument is also valid for a distribution of n variables. The 
multiple correlation i?i( 23 ...n) i® correlation between x^ and its 
estimate 28 ...n from the regression equation (20); and the argu- 
ment leads to the corresponding formula 

^1.23... n “ ~ ^l(23...n)) ) 

and 1 — -ffi< 23 ...n) ~ (24 ) 

If the multiple correlation is equal to unity, Xi is a linear function 
of arj, 

Example 1. Prove the following relations: 

2a?l.t3^1.23 = 

IS . 23 "i" ^12 . 3 ^ ^2 "i" ^13 . f 2 Xj X3, 

1^1(28) “ (^12.3 + ^13.2 

Example 2. Show that, for the example in the preceding section, 

■^sdt) “ 0*63. 


113. Partial correlation 

Consider next the correlation between the deviations 3 and 
X 2 . 8 . Since j is the deviation of x^ from its estimate in terms of 
Xg, we may regard it as that part of the variable x^ which remains 
after the influence of X 3 has been eliminated, as far as can be done by 
a linear equation. A similar interpretation can be given to Xg 3. 
Hence the correlation between these deviations may be looked upon 
as the correlation between x^ and Xg after the influence of Xj has been 
ehminated. We denote this correlation by ri 2 . 3 , and call it the partial 
correlation between x^ and Xg in the trivariate distribution. Having 
one secondary subscript r^g 3 is a partial correlation of the first 
order. It will be remembered th^t, in calculating the total correla- 
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tion ri2, the values of are simply ignored. It is therefore a correla- 
tion calculated on the assumption that the variables and x^ are 
influenced only by each other, and not by any other variable. 

To find the partial correlation we observe that, by the 
theorems of § 110, 

0 = 2^2.8^1.23 ~ 2^2.3(^1“"^12.3^2“^13.2^s) 

*=» 2 ^1^2. 8 “"^12. 8 2 ^2 ^2. 8 ~ 2 ^1.3^2.8 ^12.8 2 ^2.8> 

SO that ^12.3 “ 2 ^1.3^2. 3/ 2 ^2.8* 

From this result it follows that 612.3 the coefficient of regression 
of a;i 3 on 0:2.8. Similarly, 621.3 the coefficient of regression of 0:2.3 
on 0:1 3; and the coefficient of correlation between these deviations 

*• rS,, . (27) 

Since <7i 3 and cTg 3 are the standard deviations of 0:1 3 and 0:3 3, the 
coefficient of partial correlation is connected with the coefficient of 
partial regression by the usual formulae 

^12.8 ~ ^12. 8^1. 3/^2. 3 > ^21.3 ~ ^21.3^2. 3/^1. 8 * 

And it is now clear why the above 6’s are called coefficients of partial 
regression. The correlation ri2.3 the same sign as 612.3, which is 
the sign of “6^12 by ( 18 ). Also, in virtue of ( 27 ), ri2.3 is symmetrical 
in the primary subscripts. If the values of the 6's are substituted 
from ( 18 ) in ( 27 ) we obtain 

^12 

0>1i0>22 ^11^22 


SO that 


^12.8 


““^18 _ ^12 “^ 13^23 

V(^ii^ 2 a) V{(^ ■” ^13) “ ^‘23)} 


( 29 ) 


Similar formulae may be written down for ri3 g and rgg. i- The partial 
correlations of the first order are thus expressible in terms of the 
total correlation coefficients. 

The partial correlation between and x^ in the trivariate dis- 
tribution is sometimes defined as the correlation between and 
for a constant value of 0:3. In general, however, this correlation will 
depend upon the constant value of x^ selected. In certain special 
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ca43es the second definition agrees with the first, and the result is 
then the same for all constant values of x^. Necessary and sufficient 
conditions* for this agreement may be stated: 

(а) In the bivariate distribution of and x^ (x, being ignored), 
the regression of on x, must be linear, and the standard de> 
viations of all the x^ arrays (Xg = const.) must be equal. 

(б) In the trivariate distribution the regression of x^ on Xg and 
Xg must be linear, and the standard deviations of aU the x^ arrays 
(Xg = const., Xg const.) must be equal. 

These conditions are satisfied in the normal trivariate distribution 
to be considered in § 116. 

For a distribution of n variables there are partial correlations of 
all orders from 1 to n— 2. Thus if (k) denotes a definite group of 
secondary subscripts not including i and the correlation between 
the deviations x^ and x^ denoted by is a partial correla- 
tion of order equal to the number m of subscripts in (i). It is con- 
nected with the regression coefficients for these deviations, and their 
standard deviations and cr^ (j^), by formulae analogous to (27 ) and 

(28),m.mel7, ' (30) 

and ^<y, 0 k) = 

If (A:) includes all the subscripts but % and we find as in (29) 




the symbols on the right being cofactors in the determinant o) of 
order n. 

A partial correlation of order m-f 1 is expressible in terms of 
those of order m by an equation of the same form as (29), with a set 
(ib) of secondary subscripts added to each coefficient in the formula. 
Thus 


Uj.kUd 


— rjg (g)) (1 —ffA.ofe))}* 


( 32 ) 


where h, i^j are unequal. 


Example. Prove that, for the example in § 111, 
fgi.g = 0*446, = 0*42. 


For a proof see Camp, 1034, 1, pp. 341-2. 

/ 


0 
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114. Reduction formula for the order of a standard deviation 
A s.D. of any order may be expressed in terms of a s.d. and a 
correlation coefficient of lower order. Thus, since 

2^1.23 “ ““^12.5^1“" ^18. 

** S^ 1 . 2 “"^ 18 . 2 S^ 1 . 2 ^ 8 .a» 

we have, on dividing by N and using (26) with subscripts inter- 

^ * ^1.28 ~ ^1.2(^ "■^ 18 . 2 ^ 31 . 2 ) “ ^1.2(^ "~^18.2)> 

which is of the same form as ( 6 ). Since o-J 23 is symmetrical in the 
secondary subscripts, the subscripts 2 and 3 may be interchanged 
in the second member of (33). Substituting the value of erf 3 given 
by ( 6 ) we may also write the result 

erf 28 = "“^ia) 

Similar formulae may be written down by cyclic permutation of the 
subscripts. The equations (33) and (34) show how a s.d. may be 
expressed in terms of one of lower order, and one or more of the 
correlation coefficients. Also from (33) it follows that otj 
T he estimate of from and x^ is thus in general better than the 
estimate from x^ alone, being just as good only if ri 3 , is zero. 
Further, on comparing (34) with (23) we see that 

l-^( 2 a) = (l-»-L){l-r?3.,). (36) 

which is in agreement with the values already foimd for and 
^ 18.2 ^ terms of the total correlations. From (35) it is clear that 

1 •®i(2a> ^ ^ ^i2> 

so that i?i( 2 a)^^! 2 - 

Since Ity{^ is symmetrical in the subscripts 2 and 3, it follows that 
this coefficient of multiple correlation is not numerically less than 
either ri 2 ^la- ^ then i 2 i( 23 ) is zero, both ri 2 and must be zero, 
and Xi is then uncorrelated with either X 2 or x^. 

The same argument shows that, in the case of n variables, the 
equation (33) has the generalization 


( 36 ) 
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where (i) denotes as usual a group of secondary subscripts. Repeated 
application of this formula leads to a generalization of ( 34 ), namely, 

^1.88 ...n ~ ““^12) "“^13.2) ••• (1 “■^ln.28...n-l)* (37) 

Comparing this with ( 23 ') we see that 

^ “■ ■®l(28...n) ~ (1 ""^12) “"^13.2) ••• (i ■”nn.28...n^)* (33) 

If I?i(2s.,.n) =* ^ each of the correlations in the second member is 
zero, and also each of the coefficients ^i3> •••> ^in- Thus is then 
uncorrelated with any of the other variables. 


115 . Reduction formula for the order of a regression coefficient 

To express a coefficient of regression in terms of coefficients of 
lower order we may proceed as follows. First 


2 ^ 1 . 3 ^ 2. 8 “■ S ^l (^2 "" ^23^3)* 

Then, since 623 = we may write the above equation, in 

virtue of ( 26 ). . -h <r^ b <rV<T^ 

^ 2 . 8 ^ 12.3 “■ ^12^2 ^ 13 ^ 3 * ^ 32 ^ 2 /^ 8 > 

or, by means of (6), 

(r|(l — r|3)6|^2.3 == ^2(^12 “’^13^32)* 


Thus we have the required reduction formula 


6 


12.8 ““ 


^12 “ "^ 13^32 
1 ^^ 23^32 


( 39 ) 


This may be expressed in terms of correlations. For, in virtue of 
( 28 ), it is equivalent to 


^1.3 _ ^ / ^12“~^13^32 \ 
“•^ 2.3 <^ 2 \ 1-^3 /* 


( 40 ) 


Substituting the values of cTj j and <r j 3 given by equations of the 
form (6), we find ( 29 ) again. 

In the case of n variables, by similar reasoning, we arrive at the 
generalization of ( 39 ), f 

h ^12.(fc)""^18.(fc)^32. (Jk) 

^lt,ZOe) ** 1 /j h * 

* ~^23.(Jk)^82.<ik) 


where (it) has the usual significance. 
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116. Normal distribution 

A generalization of the bivariate normal distribution to the case 
of three variables may be obtained as follows.* First suppose that 
the variables and are normally distributed about zero means 
with correlation rgg. Then the probability that a pair of values chosen 
at random will fall in the interval dx^dx^ is 




dx^dx^ 


2n(r^(r^ 






where = l — rlg. Next assume that the regression of x^ on x^ 
and a ?3 is linear, and that in each x^ array the variable is normally 
distributed with s.d. which is the same for each of these arrays. 
Then, since the mean of each array is on the plane of regression (7), 
the S.D. of each array is crj 23» given by 


<^1.23 = 


(41) 


Consequently the probability that x^, chosen at random in an 
assigned array, will fall in the interval dx^ is 


dP2 = 


dx 


or 

doc^ 

(T^ yji^lTTO)) 


~ \ ( X^ X^ Xn\^~' 


by (41) and (17). On forming the product have the 

probability that a set of values of the three variables, chosen at 
random, will fall in the interval dx^dx^dx^y as 

dx^dx^dx^ 


dP^ 


,exp(-i^), 


0'i0-20-3V{(27T)®w} 

where, after a little reduction, it will be found that 


(42) 


, 1 / x\ 

9 = - <^ 11-2 
^ (o\ Vf 


+ 0 ). 


*1 


+ W83-2 + 2<^23 — 

Cl 3 OgCTj 






12 


ri<rj' 


(43) 


The symmetry of (42) and (43) in the subscripts shows that the 
properties of the three variables are similar. Thus all the regressions 


• Cf. Rietz, 1927, 2, pp. 106-7. 
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are linear. The variance of each array of in the trivariate 
distribution is of each array of x^'s is 

(ocrll(o^^ = cr| 1 ,. 

We may verify the statement made in § 113 that, in the present 
case, the correlation between and for a constant value of x, 
is the partial correlation fjj For may be expressed in the form 

^ ~ w r ^ ~ ^ (^1 “ 

Im 1 2 

+ (^1 (*2 “ ^aa^a)”] + ^ 

CTiO-g J OTj 

and, on comparing (42) with § 38 (28), we see that, for a constant 
value of ^ 3 , Xi and x^ are normally distributed about means 613 ajg 
and 633 ®a respectively, with correlation 

e>23 ^ 

V(^11^22) 

as stated. From (42) and (44) it is also clear that the deviations 
and X 2 j are normally distributed with correlation 3 . 

The above results may be extended to the case of n variables.* 

117. Significance of an observed partial correlation 
Fisher ha^ proved that the sampling distribution of a partial 
correlation coefiicientt of order k, in samples from a normal popula- 
tion, is of the same form as that of the correlation coefficient for 
samples from a bivariate normal population, with the sample 
number N reduced by k. In particular, when the partial correlation 
in the population is zero, the square of the partial correlation 
coefficient, r, in samples of N sets of values, is a ^{N — fc — 2 )) 

variate. By the argument used in §89 it then follows that the 
statistic t defined by 

^ ~ ^( 1 Z ^ 2 ) ^ 2), (46) 

conforms to the t distribution for {N — k—2) d.f. We are thus able 
to test the significance of an observed partial correlation. 


* Cf. Yule aud Kendall, 1937, 1, pp. 282-4. 


t Fisher, 1024, 4. 
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Example 1. From a random sample of 21 sets of values from a normctl 
population the calculated value of a partial correlation of order three is 0*40. 
Is this consistent with the assumption that the corresponding partial corre- 
lation in the population is zero ? 

In applying the above test to our assumption we have = 21 — 3 — 2 = 16, 
and 

t = = _LL = 1.74 

V(0-84) 0-916 

lliis value of t is not significant at the 5 % level, so that the observed correla- 
tion is not significant of correlation in the population. 

From the sampling distribution it follows, as in the case of two 
variables, that if Fisher’s z transformation of § 94 is applied to the 
above partial correlation of order k, the statistic z is distributed 
nearly normally with variance l/(iV' — A: — 3). We are thus able to 
test if an observed partial correlation differs significantly from some 
assumed value. Similarly, we can test the significance of the 
difference of the observed partial correlations in two independent 
samples. 

Example 2. From independent samples, of 32 and 23 sets of values, 
partial correlations of order four are found to bo 0-4 and 0-6 respectively. 
Examine (i) whether the first value is consistent with the assumption of a 
normal population with a corresponding correlation of 0-7, and (ii) whether 
the two samples may be regarded as from the same normal population. 

(i) From the Table 7 we find that the values = 0-4 and Tq = 0*7 corre- 
spond to Zy = 0*424 and Zq — 0-868. The deviation of is therefore 0-444. 
The s.E. of Zy is l/-^32 — 4 — 3) = 0-2. Since the deviation of 2 , is greater than 
twice the s.E. it is significant. The assumption of a correlation of 0*7 in the 
population is thus ruled out. 

(ii) Corresponding to rj = 0*4 and r, = 0*6 we have Zj = 0*424 and 
= 0*093, giving z^ — Zy = 0*27 nearly. The s.E. of the difference of the z's 

is given by 

« = V(A + A) = V(0-1025) = 0-32. 

The difference z, — Zi, being less than e, is not significant. The samples may 
thus be regarded as from the same population. 

118. Significance of an observed multiple correlation 

Consider next the significance of an observed multiple correlation 
coefficient, Jf?, of the variable with the p variables x^y ..., 
calculated from a random sample of N sets of values from a multi- 
variate normal population. Fisher has found the general samphng 

17 


WMS 
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distribution^ of iJ, and has shown that it depends, not on the whole 
matrix of correlations between the variables, but simply on the 
multiple correlation in the population and the sample size, N, 
In particular, when the multiple correlation coefficient in the 
population is zero,! -B* is a \(N —p — 1)) variate. In virtue 

of Theorem vii of § 72 it follows that, in this case, — iP) is a 
y? 2 ( i i ~ P — 1 ) ) variate ; and therefore, by the argument of § 92, 
the statistic F defined by 




R2 N^P-I 
1-B2 p 


(46) 


conforms to the F distribution for 


= p^=-N-p-l. (47) 

To test the hypothesis that the multiple correlation in the population 
is zero, we have only to determine from the table of F whether 
the value of this statistic calculated from the sample is significant. 
This decides the significance of the observed multiple correlation. 
We may also remark that, since is a fiidp, \(N -p-^ 1)) variate 
in samples from a normal population in which is uncorrelated 
with any of the p variables Xj, Xj, ..., x^^^^ the mean value of B* is 
given by 


The problem may also be approached from the point of view of 
analysis of variance and covariance. The estimate (p) of x^, given 
by the regression equation of on the p variables x^, is 

of the form 

~ ^ 12^2 + ^ 13^3 + + 

and this is connected with the corresponding deviation by 
the equation 

i*^i = «i.(p) + «i.(p)- (49) 

♦ Fisher, 1928, 1. See also Wilks, 1932, 1. 
t See also Fisher, 1924, 5. 
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The N values of are connected by the p+l normal equations 
= 9, = 9 (t = 2, ...,29+ 1) (50) 

Squaring (49) and summing over the distribution we see that 

= (51) 

the sum of products vanishing since 

2 ^l.(p)^l.(p) ~ 2^1.(p)(^12^2’i“^13^3"^ •**) ~ 9, 

in virtue of (50). The sum in the first member of (51) is equal to 
NSlf where SI is the variance of in the sample; while the first 
sum on the right has the value iVB'Kl — E^) by (23'). Consequently 

(62) 

The equations (50) may be regarded as 2 >+ 1 linear constraints on 
the N values x^ Then, on the assumption that these deviations 
are normally distributed, and that R is zero in the population, the 
sum 2 x\ap)Io‘^ is distributed like with N — d.f., being the 

variance of x^ in the population. But, by §77, ^x\1(t^ is similarly 
distributed with ^—1 d.f.; and therefore, by Theorem H of §81, 
2^i.(p)/^* is distributed like with p d.f. Thus the two sums on 
the right of (61 ), when divided by JV — p — 1 and p respectively, give 
independent and unbiased estimates of cr^. Inserting their values 
in terms of R^ we see that the quotient of these estimates 


N^p^l 
l-R^ p 


conforms to the F distribution for 


We thus arrive at the same result as before.* (See also Ex. xn, 13). 

Example. In a sample of 25 sets of values from a normal population 
l^i(ss 4 ) found to be 0*4. Show that this is not significant of correlation 
in the population between Xy and the variables x,, x,, x^. 

Here p = 3, = 3, i', = 21. Hence 


„ 016 21 4 

P = X = - ; 

0-84 3 3 


This is not significant, since the 6 % value of is about 3*1. 


♦ For the distributions of various statistics occurring in tliis chapter see 
Bartlett, 1033, 3. 


tr-* 
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EXAMPLES XII 

1 . In a trivariate distribution it is found that 

(T^ = 3, 0*2 = 4, cTg = 6; fga = 0*40, = OGO, == ^*70. 

Prove that the partial correlations are 

^ 23.1 ~ — 0 * 035 , ^32 2 — 0 * 49 , ^ 22. 3 — 0 * 63 , 

and that, if the variates are measured from tlieir means, the linear 
regression equations are 

= 0 ' 41 a; 2 -f 0*233:3, X2 = 0*960:2 — 0*0250:3, = l* 04 o :2 — 0* 050:3 . 

Show also that 

^ 1,23 ~ 1 * 87 , 0 '‘ 2.31 ^ 2 * 8 o , ^ 3.12 ~ 4 * 00 , 

•® l ( 23 ) ~ 0 */ 8 , f ? 2 ( 3 l ) “ 0 * 70 , -^ 3 ( 12 ) — 0 * 60 , 

2. Show that 

^1.230^2.31/^12.3 ~ 2/^129 

and, more generally. 
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3. From § 114 (38), deduce that 

^ “*^1^1.23... n-l = (1 ~ -®l(28...n))/(^ ~ ^l(23...n=^ 

which shows how a partial correlation coefficient of order n — 2 may 
be expressed in terms of multiple correlation coefficients of orders 
n — 1 and n~2. 

4. Prove the identity 

^12.8^23.1^31.2 ~ ^12. 8^23. 1^31. 2* 

5. Prove the formula 

6i2 = (^12.3 + ^13.2^32.i)/(^ ““^13.2^31.2)» 

expressing a regression coefficient in terms of coefficients of higher 
order. Write down the corresponding formula with subscripts 1 
and 2 interchanged, multiply together the two equations and take 
the square root, thus obtaining 

^12 = (^12.3 + ^13.2^23.i)/V{(^ ^ 13 . 2 ) (1 

expressing a correlation coefficient in terms of coefficients of higher 
order. 

6. Prove the formulae of Ex. 5 with a group {k) of secondary 
subscripts added to each of the coefficients. 

7. Verify the values of <p expressed by § 116 (43) and (44). 

8. Show that, if rcj = ax^-\-bx^y the three partial correlations are 
numerically equal to unity, 2 having the sign of a, the sign 
of 6, and r^j.s the opposite sign to ajb. 

9. For a sample of 30 sets of values from a normal population, 

^ 1 ( 23 ) found to be 0*5. Show that this is significant of correlation 
in the population between and Xj, x^, {F = 4-5, = 2, = 27.) 

10. Show that a partial correlation ri 2.84 = 0*5, in a sample of 
20 sets of values from a normal population, is significant at the 5 % 
level. 

11 . Two independent samples, of 46 and 36 sets of values, give 
corresponding partial correlations of order three as 0*41 and 0*66 
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respectively. She w that this is not inconsistent with the assumption 
that the samples are from the same normal population. Show also 
that an estimate of the partial correlation in the population, by the 
method of § 96, is 0*53. 

12 . Show that, corresponding to the first sample in § 117, Ex. 2, 
the 95 % fiducial limits for the partial correlation in the population 
are 0-03 and 0*67. 

13. Deduce from the argument on p. 259 that, when the mul- 

tiple correlation in the normal population is zero, R^l{l—R^) 
is a variate and therefore, by §72, R^ is a 

fiidPf 1)) variate. 

14. With the notation of §118 show that, when the multiple 
correlation in the normal population is zero, the expected value of 
R in the sample is /"(Kp-f l))r(^{N— l))/r(ip) 
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